Wikidata, the crowdsourced database of structured knowledge by the Wikimedia movement has grown to over 24 million entries and by now has structured information for every major settlement on earth. These are extremely useful properties like multlingual labels, statistics like populations and GDP, and other related information like politics, history and media about the place (See London, New York City, Timbuktu).
Geolocated articles on Wikidata, with those added in the last year highlighted in pink. Source: Wikidata Map
Current state of multilingual tags
One of the great strengths of OSM is to leverage the data to create create multlingual maps that make the map accessible to a lot more readers than just the local population. Since the beginning of the project, the community have been adding various
name:code tags for this purpose, and has resulted in map features with a ever growing list of multilingual names eg. the node for London has 171 properties, of which 155(90%) are name tags in various languages.
A more scalable approach would be to leverage the Wikidata entry for London, which has the translated name in 248 languages, and growing automatically with every Wikipedia page of the city that is created in a new language.
This would also enable the translation of a map to languages where that language on OSM would be considered non local and not worthy on adding to the map, eg. Ukranian labels for cities and towns in UK.
Adding the Wikidata link to OSM
The first step to start leveraging the power of Wikidata from OSM is adding a simple
wikidata property to the feature on OSM with the associated
QID of the corresponding concept on Wikidata eg.
wikidata=Q84 for London. Check out this video by user:polyglot on doing this via the JOSM Wikipedia plugin or via the iD editor.
Matching Wikidata items to OSM
Just like OSM, Wikidata items of places have tags describing the feature and coordinates that make it possible to automatically match a feature on OSM to the corresponding feature on Wikidata. Unfortunately the geographical accuracy of Wikidata entries cannot be trusted, as many of the coordinates are derieved from Wikipedia pages which in turn are usually derived from Google Maps. Moreover entries of lesser known places may not be tagged correctly on Wikidata and might result in ambiguous matches to an OSM feature. For this reason manual confirmation of a match is necessary.
At the Mapbox data team, we have been experimenting with adding Wikidata tags to cities and towns on OSM based on an exact name and location match. The possible matches were loaded onto a spreadsheet with the match distance and Wikidata description of the corresponding item. After a manual review, its easy to confirm the match with a very high degree of confidence based on the name, distance and description of the match. With this approach we have found that just an exact name and location match can give a 99% success rate for places.
Over 5,300 cities and towns have been updated with corresponding
wikidata tags in the last two weeks http://overpass-turbo.eu/s/jGy
There are two cases when the name matching happens: - Unique matches: One OSM feature matches to one Wikidata feature - Duplicate matches: One OSM feature matches to multiple Wikidata features with the same name
In most cases, the location of the matched feature on Wikidata is less than a few Kms, and by confirming from the description that the feature is also a city or town, its possible to confirm this was the correct match. It is important to be careful about the feature description as in some cases Wikidata may have ambiguous entries that represents multiple concepts like both a city and a province with the same name as one object.
For unique matches with a large match distance >10kms, it is likely the match was to another place with the same name and is an incorrect match. In a few rare cases, the Wikidata location was found to be incorrect and was actually a correct match.
When an OSM feature matches to multiple Wikidata entries with the same name, it is considered a duplicate match. In most cases a distance filter of around 10km enables a unique match, and a further look at the description can confirm the match is correct.
In a few rare cases multiple OSM features with the same name and location match to a single Wikidata feature. These are places with duplicate nodes on OSM itself and need to be merged.
Large scale map features like countries, cities, towns and water bodies are great candidates to start matching with Wikidata as they are fairly well defined on both projects and can be matched without ambiguity. Doing this will allow us to better understand the value that Wikidata can add to OSM, and help pave the wave for more interesting map services that can be built on open data.
There’s been some amazing work from EdwardBetts on matching all of Wikidata to OSM. You can see the results and this can be a good push to the efforts of contributors like User:Pigsonthewing on bringing the two biggest crowdsourced open data projects in humanity can get closer together.