Surfacing Wikidata objects with coordinates to match them with OSM

Posted by sabas88 on 1 March 2017 in English (English). Last updated on 11 March 2017.

Wikidata, as a project derived from Wikipedia, could be viewed as a crowdsourced database of VGI (Volunteered Geographical Information), of course less structured than OpenStreetMap spatially but at least comparable: we think that a cross-reference could be worthwhile for both projects. This work has already started from some years ago with the wikipedia tag (notably the WIWOSM project, and in Italy wtosm) but now the focus seems to be moving towards the use of Wikidata instead of Wikipedia.

In this post I would like to introduce our experiment in this direction, powered by the resources we have as a chapter of both the OSMF and the WMF.

We started from an existing OSM database replicated every half hour through osmosis, where all the tags are dumped in an hstore column and we added a table called wikidata and a view which gathers existing elements tagged with the wikidata key (UNION of nodes, ways and relations).

The wikidata table is populated by a script which parses the weekly Wikidata dump (~10 GB gzipped line delimited json): we get only the elements having a claim with the P625 property (an element with at least a coordinate) and we take only the ones in Italy (a “rough” point in polygon test). The objects are then saved with the most precise coordinate available, their id and a label (italian, english, serbo-croatian or the first available).

Why serbo-croatian you may ask? We noticed that the Wikipedia editors created a lot of stubs from Geonames which went to generate new Wikidata items having only the label in the sh iso code :-)

Now we have our brand new table and we can create our service: a map showing all the Wikidata elements colored by their OSM status. Green if already matched, Red if it’s an element which can’t appear in OSM (an historical battle or structure for example), Grey if they still need to be processed. Each marker has its popup, linking to the object on Wikidata (and on OSM), the wikidata tag to copy, and two buttons: one to mark the object as non-mappable, the other to mark it temporarily done (it would -hopefully- become green on the next run).


The service is live at and covers Italy.


Comment from GreyTK on 2 March 2017 at 04:12

Really cool stuff. I hope this becomes something that people can contribute to with local knowledge

Comment from pizzaiolo on 2 March 2017 at 14:21

Nice! I hope to see more countries added soon :)

Comment from tyr_asd on 2 March 2017 at 18:08

Wikipedia editors created a lot of stubs from Geonames which went to generate new Wikidata items having only the label in the sh iso code

oh… that sounds like a lot of fun to clean up after this import.

In fact, I already found quite a lot of duplicate wikidata entry pairs where one stems from a “real” wikipedia article and one from a geonames-imported stub article. (e.g. Q1526768 and Q18473363.) Fortunately, one can quite easily fix them by merging the items with this wikidata tool: (@sabas: I’ve signed those points on your tool as “non mappabile” because the respective item is now a redirect, is that the correct way to do it?).

Comment from sabas88 on 2 March 2017 at 20:18

I use the Merge gadget (turned on from, remember to merge in the lowest number.

As for the merged items, they should disappear after the next wikidata dump release perhaps, we’ll look into it and update the query..

Comment from PlaneMad on 3 March 2017 at 05:47

This works beautifully and is really the kind of data tool that should be more integrated as a layer in both Wikidata and OSM. Looking forward to exploring the data in other parts of the world =)

Comment from sabas88 on 11 March 2017 at 18:16

Here’s the code if someone is interested to replicate in another country.

Login to leave a comment