Social Engineering the Mapbox Data Team

Posted by zeromap on 5 April 2015 in English (English)

Introducing erroneous data through anonymous feedback submissions

Updated with Mapbox's response

A few months ago Mapbox introduced Feedback, a feature that allows for anonymous submissions to improve OpenStreetMap data. It's a low friction service that allows anyone to anonymously add a short note to a geographical point, similar to OSM Notes. These notes are then checked by Mapbox employees and corrections are made to the map if needed. Tools like this are a great way of expanding the base of contributors and keeping the map up-to-date. However, verification is an important part of contributing to OpenStreetMap, as mentioned in the disclaimer for OSM Notes contributed anonymously.

OSM Notes

After learning about Mapbox Feedback, a few questions came to mind.

  1. How would the feedback be verified?
  2. How would Mapbox editors work with the local OSM community?

In addition to the OSM rules, Mapbox's Data Team has a set of guidelines that they follow in performing edits.

I could have asked Mapbox my questions directly, but not all hypothetical scenarios are worth entertaining. Instead, I designed a simple experiment that, if handled correctly, would lessen the need to raise the concern in the first place. No harm, no foul.

Using the Mapbox Feedback tool, I made a submission for Skyway Park which I knew the name of but hadn't yet mapped. The text of the note consisted of only three words "Sky Way Park", an intentionally vague description and misspelling of a part of the feature that can't be verified using satellite imagery.

Feedback submission

Soon afterwards, it was added by a member of the Mapbox data team in this changeset.

Data team edit

I contacted the user who created the edit via an OSM message and explained that I was the anonymous submitter that was the source of their edit. I also asked if there was any verification besides my (then) anonymous submission. After my inquiry on the changeset discussion, they again edited the park with the correct name citing the municipal webpage for that park. It's unclear if that was the source for the initial edit (28458868), but if so why wasn't the correct spelling used?

Update: Mapbox editor Andygol's comment.

The example I chose was deliberately inconsequential, a more blatant error would likely have been met with greater skepticism.

This submission probably wouldn't be accepted

Therefore, it would be irrational to generalize about the quality of contributions by the Mapbox data team. But it is concerning that edits are being done seemingly without verification and anonymous submissions are being taken as the ground truth.

It's easy to see how the feedback tool could be misused. Someone may (intentionally or unintentionally) introduce false data or copy infringing data from an incompatible source.

An obvious question is "well if someone wanted to add false data, couldn't they make an account and do it themselves?" Of course they could, but the barrier to entry of creating an account has kept vandalism at a manageable level in most places. Additionally, a new user with few edits invites much more scrutiny (and/or assistance) than an established editor like a Mapbox employee with thousands of edits.

Introducing false or laundering potentially infringing data through Mapbox employees is probably not a common occurrence, but it does raise questions for the long term health of OpenStreetMap and outreach to local mapping communities. Even small details can have meaningful impacts.

Just to be clear, this is not a dig at Mapbox. They are an innovative company and have contributed enormously with intelligent minds, data analysis, and free software tools many of which I use regularly. My goal in raising these concerns is to remind all users to be reasonably skeptical of unverified submissions, but especially those in the Mapbox data team and other organizations that deal with a high volume of anonymous submissions. I also don't intend to single anyone out, I hope this can be a learning experience.

OpenStreetMap is experiencing a higher degree of automation, and on the whole this is a positive development. I hope we can maintain the same level of attribution and verification that gives the OpenStreetMap project its unique advantages. Part of this could be done by strengthening the connections between volunteer mapping communities and commercial data teams. This is easier said than done and while I don't have any specific recommendations at the moment, I hope this can aid in the discussion.

The good news is in this scenario, the error was corrected and OSM quality has been improved.

Update: Mapbox has responded here.

