The data team at Mapbox have built tools that allow detecting and cleaning these issues as they happen on the map.

Detecting issues with OSMlint

To detect issues, we run spatial analysis on the map data of the whole world everyday using OSMlint. This is made possible be leveraging the power of tile-reduce to analyze OpenStreetMap vector tiles using javascript. OSMlint consists of a set of validators which check the data on each tile for common issues in geometry and tags. An example validator is crossingHighways which detects roads that cross each other without an intersection.

Each validator produces a GeoJSON output of the OpenStreetMap geometries that are invalid and needs review.

Fixing issues with To-fix

To-fix is a micro tasking tool that helps us create tasks of issues to fix on OpenStreetMap. The issues collected with OSMlint is loaded into to-fix, each of which is then reviewed by a member of the mapping community or the Mapbox data team. Over 500,000 issues have been fixed so far using to-fix.

To-fix also allows marking issues as false positives. These are later investigated and used to improve the validator algorithm in OSMlint.

The linting pipeline

We’ve implemented an architecture that automates the detection of issues using OSMlint and loading them into to-fix on a daily basis.

This helps keep the map free of basic data errors that would otherwise cause connectivity issues during routing and navigation. You can currently access the output of the following validators in to-fix for review:

As always, we are constantly looking for issues in our systems and how it can better serve the needs of the OpenStreetMap community to create a truly open map of the highest quality. You can contribute by creating new validators, improving our current ones or just reviewing the issues on to-fix. Feel free to hit me on twitter or OpenStreetMap if you have ideas to share.

Comment from SomeoneElse on 14 May 2016 at 17:25

I think your “pipeline” isn’t quite complete…

What happens next is that on-the-ground mappers familiar with an area spot a remote edit, and go and check it. Sometimes the correction is valid, but often the “error” being corrected wasn’t actually an error at all - perhaps it was a bit of an edge case, like , where any OSM tagging would be a bit of a compromise.

In areas where there is a reasonable concentration of mappers, and it’s really not clear how things really are, it’s better to try and ask people to check the real situation rather than guessing, and the best way to do that is either a changeset discussion comment on a previous editing changeset or an OSM note.

It’s important to remember that OSM isn’t just data - it’s a representation of things in the real world, and the real world is sometimes shades of grey rather than black and white.

Comment from PlaneMad on 16 May 2016 at 05:43

SomeoneElse, the ground mapper is always the most important part of the process. The objective with tools like these is to allow ground mappers to easily modify and run an analysis for their areas of interest and fix it themselves. These are still early days of trials and testing the output and its great to detect such issues early on rather than later.

What you highlight is an issue worth solving, there are many mappers in the world who are interested in data ‘gardening’ of areas outside where they live, and we need to see how we can build tools to support such needs. Whats a reliable definition of what constitutes an area that has a ‘reasonable concentration of mappers’? Checking the history of each way or visual check of completeness cannot always work. Maybe evaluating the average freshness for the data in an area?

It might then be possible to easily distinguish which are areas with active communities that one can get in touch with for ground support or alternatively such cleanup tasks could ignore these areas altogether.

Login to leave a comment