OpenStreetMap

Surfacing Wikidata objects with coordinates to match them with OSM

Posted by sabas88 on 1 March 2017 in English (English)

Wikidata, as a project derived from Wikipedia, could be viewed as a crowdsourced database of VGI (Volunteered Geographical Information), of course less structured than OpenStreetMap spatially but at least comparable: we think that a cross-reference could be worthwhile for both projects. This work has already started from some years ago with the wikipedia tag (notably the WIWOSM project, and in Italy wtosm) but now the focus seems to be moving towards the use of Wikidata instead of Wikipedia.

In this post I would like to introduce our experiment in this direction, powered by the resources we have as a chapter of both the OSMF and the WMF.

We started from an existing OSM database replicated every half hour through osmosis, where all the tags are dumped in an hstore column and we added a table called wikidata and a view which gathers existing elements tagged with the wikidata key (UNION of nodes, ways and relations).

The wikidata table is populated by a script which parses the weekly Wikidata dump (~10 GB gzipped line delimited json): we get only the elements having a claim with the P625 property (an element with at least a coordinate) and we take only the ones in Italy (a "rough" point in polygon test). The objects are then saved with the most precise coordinate available, their id and a label (italian, english, serbo-croatian or the first available).

Why serbo-croatian you may ask? We noticed that the Wikipedia editors created a lot of stubs from Geonames which went to generate new Wikidata items having only the label in the sh iso code :-)

Now we have our brand new table and we can create our service: a map showing all the Wikidata elements colored by their OSM status. Green if already matched, Red if it’s an element which can't appear in OSM (an historical battle or structure for example), Grey if they still need to be processed. Each marker has its popup, linking to the object on Wikidata (and on OSM), the wikidata tag to copy, and two buttons: one to mark the object as non-mappable, the other to mark it temporarily done (it would -hopefully- become green on the next run).

Screenshot

The service is live at http://osmit3.wmflabs.org/wikidata/ and covers Italy.

Code: https://github.com/osmItalia/wikidata-geo-match

Comment from GreyTK on 2 March 2017 at 04:12

Really cool stuff. I hope this becomes something that people can contribute to with local knowledge

Hide this comment

Comment from pizzaiolo on 2 March 2017 at 14:21

Nice! I hope to see more countries added soon :)

Hide this comment

Comment from tyr_asd on 2 March 2017 at 18:08

Wikipedia editors created a lot of stubs from Geonames which went to generate new Wikidata items having only the label in the sh iso code

oh… that sounds like a lot of fun to clean up after this import.

In fact, I already found quite a lot of duplicate wikidata entry pairs where one stems from a "real" wikipedia article and one from a geonames-imported stub article. (e.g. Q1526768 and Q18473363.) Fortunately, one can quite easily fix them by merging the items with this wikidata tool: https://www.wikidata.org/wiki/Special:MergeItems. (@sabas: I've signed those points on your tool as "non mappabile" because the respective item is now a redirect, is that the correct way to do it?).

Hide this comment

Comment from sabas88 on 2 March 2017 at 20:18

I use the Merge gadget (turned on from https://www.wikidata.org/wiki/Special:Preferences#mw-prefsection-gadgets), remember to merge in the lowest number.

As for the merged items, they should disappear after the next wikidata dump release perhaps, we'll look into it and update the query..

Hide this comment

Comment from PlaneMad on 3 March 2017 at 05:47

This works beautifully and is really the kind of data tool that should be more integrated as a layer in both Wikidata and OSM. Looking forward to exploring the data in other parts of the world =)

Hide this comment

Comment from sabas88 on 11 March 2017 at 18:16

Here's the code if someone is interested to replicate in another country. https://github.com/osmItalia/wikidata-geo-match

Hide this comment

Leave a comment

Parsed with Markdown

  • Headings

    # Heading
    ## Subheading

  • Unordered list

    * First item
    * Second item

  • Ordered list

    1. First item
    2. Second item

  • Link

    [Text](URL)
  • Image

    ![Alt text](URL)

Login to leave a comment