The OpenStreetMap database has been getting enriched with Wikidata tags on a daily basis, with over 500,000 feature tagged till date. This is generally done based on matching the name and location of a popular map feature to its corresponding Wikidata item if it exists. Check the OSM Wiki page on Wikidata for more information.
This is currently done manually and requires local knowledge to avoid connecting unrelated features between the two databases. The most common case of mixup are:
Features with the same name exist and lie in entirely different geographical area eg. City named Salem in US and in India.
Features with the same name but of a different type in the same location eg. A railway station matched to a nearby landmark of the same name
In such cases, there are high chances of linking wrong wikidata items to OSM feature if one doesn’t match the locations of both features correctly. Apart from this, there happens a lot of human error in copy pasting the wrong wikidata QID. The following post introduces a validator tool for reviewing these mismatches based purely on location.
Validating wikidata tags in OSM features using wikidata-osm validator.
wikidata-osm is a visual validator tool which spots possible Wikidata tag mismatches by comparing the location of the OSM and Wikidata feature and highlights those where the distance between these is greater than threshold distance set by the user.
Highlight Wikidata tagged map features based on the distance between the features on OSM and Wikidata databases
Using the tool
Each circle on the map represents an OSM feature taht has a Wikidata tag. The color and size of the circle depends on the distance between OSM feature and Wikidata QID. The larger red circles represent features which are having high chances of being erroneous while the smaller green circles represents features with less chances of being erroneous.
Threshold distance on the left top pane has to be set by user. It varies depending on the type of place one is reviewing. For example, while reviewing wikidata tags for large countries, one can set the threshold as ~ 100 km. Because there is a possibility that its wikidata coordinates can be 100 km apart from OSM coordinates. But for reviewing small countries , neighbourhood places, this value can go down.
Clicking on any circle takes to above view. This represents the visual representation of locations of both wikidata item and OSM feature. Also it lists the tags, wikidata item URL and OSM feature URL in the right panel, which could help in validating the mismatch.
The tool was made to help various communities review and improve Wikidata tagging in their local areas, since there was no existing tools for this. The features displayed on the map is a static snapshot from December 2016, but clicking any feature will calculate the latest location information from OSM and Wikidata.
Feel free to play with the code on Github to make any improvements that will make it easier to validate Wikidata tags.
Other Wikidata validation tools
Yurik has done a tremendous amount of work in the last few months to bring OSM and Wikidata closer. He recently spoke about using the instanceOf property of the linked Wikidata features to validate potentially incorrectly matched features in this OSM-talk thread and an open list of questions to evolve best practices to link the features between the two databases.
If you have any feedback or ideas on how to improve such processes or some other tools that were missed out it would be great to hear.