OpenStreetMap

An open database of inconsistent edits observed on OSM from OSMCha

Posted by manoharuss on 22 December 2016 in English (English)

OSMCha is a an open source changeset exploration tool originally created by Wille Marcel. Early 2016, few of us at Mapbox were interested in using this tool for trying out validation on a changeset level. Over the course of 2016, we made several improvements to the tool. As of this morning we reviewed more than 23000 changesets and found 1150 to be harmful to the map. OSMCha database consists useful changeset metadata such as changeset ID, username, editor used, changeset comment, source, imagery used, and timestamp.

You can download a CSV of all the reviewed changesets here. For community members who are interested in validating the map using OSMCha, our validation guide can be a good starting point in understanding the tool, how we use it and validate their own neighborhood.

Few things to note

  • OSMCha does not parse all changesets from OSM. There are a few that go unparsed each day because of various edge cases that we are working on fixing. So do not take numbers on OSMCha as absolute but as near accurate estimates.

  • Some of the mapping activity marked as harmful in OSMCha are not necessarily harmful. Undiscussed, unannounced imports in OSM are constantly tracked and reverted by the DWG. These edits to the map do not necessarily have mapping mistakes in them but were found to be uninvited into the map to maintain a data import protocol, accuracy on the map and local community accord.

  • Hence, mass deletion of above imports in revert changesets by DWG cleanup accounts like Woodpeck_repair are also marked as good edits. These can be ignored by filtering out repair accounts.

  • The reviewed changesets were from random places on the map and are not specific to any place. For area specific filtering we can take advantage of bbox filter in OSMCha or filter manually as the CSV contains the bbox information for each changeset.

Basic analysis

Since we have a big dataset of reviewed changesets, we can find correlation between harmful changesets to find patterns of vandalism on OSM. I did a basic analysis using a recently added metadata filter in OSMCha stats page with which I have come to below estimates.

image

Editor wise breakdown of changesets marked to be harmful

image1

Editor wise breakdown of changesets reviewed

Filters we found to be successful

These are percentage of harmful edits observed against the number of reviewed.

iD+suspect word : 14.1%

iD+mass deletions : 7.9%

potlatch+mass deletion : 5.8%

JOSM+suspect word : 5.8%

JOSM+mass deletion : 4.9%

Maps.me : 3.7%

  • Suspect word filter flags changesets with apple, google, nokia, here, waze, tomtom, import, wikimapia as words in changeset comment or source.

Going forward

  • Having a database of OSM edits that are classified into good and harmful can help future endeavours into implementing smart anamoly detection tools and machine learning algorithms to better protect the map.

  • We are looking forward to continue validation using OSMCha, refine OSMCha changeset flagging heuristics, collaborate with the community with better open tools to protect the map.

Let us know your thoughts, how this can be taken forward and share with us your insights to improve feature level detection.

Comment from Athalis on 22 December 2016 at 16:31

The reviewed changesets were from random places on the map and are not specific to any place. For area specific filtering we can take advantage of bbox filter in OSMCha or filter manually as the CSV contains the bbox information for each changeset.

The problem with the bbox approach is that a simple bbox matching always includes changesets that have a overlaying bbox without edits in the specified area, e.g. world-wide edits.

Let us know your thoughts, how this can be taken forward and share with us your insights to improve feature level detection.

I'm reviewing (almost..) every single changeset in areas I'm actively mapping. My current workflow for that review is that I have subscribed to several rss feeds from http://zverik.osm.rambler.ru/whodidit/. Then I open the new changesets and review them with achavi.

I think I could replace that workflow with OSMCha if the bbox matching would be better. It would be great if users could save their custom filters and see in an overview how many "new" changesets there are per filter (both suspicious & all, being not verified).

Hide this comment

Comment from sanjayb on 23 December 2016 at 08:10

Hi Athalis - I'm one of the developers working on OSMCha, thank you for your comment -

I think I could replace that workflow with OSMCha if the bbox matching would be better.

Yes. So the problem currently is we just the BBOX from OSM which indicates the overall bounding box that the changeset covers. The bounding box query on OSMCha currently queries for all changesets with bounding boxes that intersect in any way with your query bbox.

I see a few options here:

  • add an option to only find changesets fully contained within the bbox you are searching for.
  • display the overall area covered by a changeset in the list view to be able to quickly eye-ball changesets that are very large.
  • ... would be appreciative of any other ideas ..

I think adding the option to only find changesets fully contained within the bbox you are searching for would be easy to add, but then you could also miss some changesets relevant to the area you are searching. Am not sure of a good solution here, but would be very appreciative of hearing what you think.

It would be great if users could save their custom filters and see in an overview how many "new" changesets there are per filter (both suspicious & all, being not verified).

Being able to subscribe to a custom RSS feed based on your filters is very high up on my list of priorities. Hope to have this working soon - really good to know that there is desire for such a feature.

Thanks again for your feedback and inputs!

Hide this comment

Comment from Athalis on 23 December 2016 at 10:02

Am not sure of a good solution here, but would be very appreciative of hearing what you think.

Filter fully contained within the bbox would not be enough, just consider a long power line. It wouldn't be matched.

"display the overall area covered by a changeset" wouldn't help in my case. I wouldn't know if it affects objects in my area or not until I open and review it.

Afaik http://zverik.osm.rambler.ru/whodidit/ maps the object of each changeset to a quadtree / tiles, which it uses then to query for the rss feed. Maybe you could get in contact with the author(s?) to figure out more?

Hide this comment

Comment from wille on 28 December 2016 at 16:58

Very good to see these stats and suggestions for OSMCHA!!! :D

Hide this comment

Comment from joost schouppe on 2 January 2017 at 17:24

First, great stuff that all changesets can potentially be reviewed, and reviewed only once!

If dreaming is allowed, it would be nice to have as many tools as possible connect with a "revision database". For example, we use welcome.osm.be to review the first few changesets of new contributors as well as welcome them. Some of those edits are already fixed by an active reviewer, and they might reviewed a third time using your tool.

One way to avoid this, is to expand the capabilities of OSMcha itself. A necessary extra change would be to allow filtering by admin area instead of bbox.

A bit more complicated: a lot of those newbie changesets are just "slightly wrong", e.g. adding a path without a connection. Should there be a distinction between "harmful" and "needs fixing" or is the point just to find things that need fixing, regardless of ill intent?

Another feature worth dreaming about is to implement #pleasereview. This is an idea to allow people to self-flag their changeset to trigger human review. It would help the less confident mapper in their growth and help other mappers understand changesets. Implementing it on the OSMcha doesn't seem to need any special additions, maybe just a tracking tool at the stats page. The bigger thing would be to ask the iD, JOSM and Maps.me developers to add a feature to help add this tag to the changeset comment. The idea is explained a bit more here.

Hide this comment

Comment from wille on 16 February 2017 at 16:48

Very good ideas, Joost! We will consider it in our future plans!

Hide this comment

Leave a comment

Parsed with Markdown

  • Headings

    # Heading
    ## Subheading

  • Unordered list

    * First item
    * Second item

  • Ordered list

    1. First item
    2. Second item

  • Link

    [Text](URL)
  • Image

    ![Alt text](URL)

Login to leave a comment