Scaling the matching of Wikidata to OpenStreetMap with wikimama

Posted by BharataHS on 30 January 2017 in English (English)

The OpenStreetMap community has been adding wikidata tags to OpenStreetMap features for the last several months. Linking OpenStreetMap features to Wikidata can leverage both projects to create novel applications such as multi-lingual maps and integrate other semantic information which OpenStreetMap doesn’t have but Wikidata provides.

Till date, over 400K features have a wikidata tag. The integration of linking wikidata within iD and JOSM editors allowed mappers to efficiently map wikidata tags but the biggest challenge is to focus on the best match for cases when location or names do not exactly match.

This post will walk you through using wikimama, a tool to find and add missing wikidata tags in OpenStreetMap.

Matching Wikidata tags in neighborhoods with ‘wikimama’

Let’s say you want to get a list of all neighborhoods/suburbs within Stuttgart that don’t have a wikidata tag and get a list of wikidata tag matches. The wikimama tool gives you this list that you can manually review and later add to OpenStreetMap.

  • Get the information of Stuttgart in OpenStreetMap

screen shot 2017-01-20 at 3 03 39 pm

Stuttgart boundary in OpenStreetMap

We need the following information about Stuttgart: name, center longitude, center latitude, wikidata. Most cities in OpenStreetMap already have a wikidata tag, in case it doesn’t exist, you can find it directly in (here’s the wikidata entry for Stuttgart).

The boundary shows a radius of about 10 km will cover all neighborhoods/suburbs within Stuttgart.

  • Clone the wikimama repository from Github.

  • Prepare an input file to initialize wikimama. The input file needs the following:City, longitude, latitude, wikidata_id, radius, lookupdistance.

screen shot 2017-01-20 at 3 32 47 pm

  • Finally, run node batch-match.js --file input.csv.

  • Once finished, wikimama will produce an output.csv listing all neighborhoods/suburbs within a 10km radius of Stuttgart and the probable wikidata tag that matches each place.

  • Import the csv to a spreadsheet so you can do basic filtering and sorting. Use the JOSM url field to directly edit the feature in JOSM.

screen shot 2017-01-30 at 10 58 54 pm

Output imported to a spreadsheet with basic filtering and formatting

What’s under the hood?

Wikimama queries OpenStreetMap for all neighborhoods/suburbs within your defined threshold (10 km). This list of queried places are then compared to existing wikidata entries under the human settlement class/subclass. The matching compares both the geographic and Levenshtein distances for each entry and gives it a score of similarity. Higher score means a good match. Full description of the output csv is available in the repo’s wiki.

To be clear, we are not proposing any automated upload of wikimama generated results to OpenStreetMap, Wikmama does not guarantee a 100% match for all queried places. This tool only provides a list of potential matches and scoring. The mapper should ultimately decide whether the match is correct and then upload to OpenStreetMap.

Our team at Mapbox have started using this tool to match wikidata tags in 20 cities and have linked for about 1,500 neighborhoods. If you need help in using wikimama or want a list of matches for your city, please comment here or directly in the wikimama repo.

Login to leave a comment