OpenStreetMap logo OpenStreetMap

Map Validation at Facebook

Posted by Yunzhi on 7 July 2021 in English.

At Facebook, we’ve been releasing our Daylight Map to the public since March of last year. Daylight is a version of OSM that has been quality checked and validated to ensure that the resulting map is safe and of high quality. It is the map data that is used on Facebook maps. Our goal is to make OSM easier and safer to use which will then expand the use of OSM.

When building Daylight, we employ automated validation processes to find bad data. That bad data is then either deleted from the map or corrected in the OSM database. This diary will go over the map validation efforts we’ve made to the map to make Daylight safe to use.

The map validation efforts includes three parts: vandalism and profanity cleanup, relation repair, and Atlas Checks fixes.


Vandalism and Profanity Cleanup

In the last couple years, our engineering team has spent a lot resources to research how we can identify vandalism and profanity more systematically and more accurately. With these improvements to our detection tools, we were able to fix over 360 examples of vandalism and 16 instances of profanity on feature names all over the world.

vandalism and profanity fixed by Facebook team


So what did we flag and how did we make the fixes?

1) We were able to flag features with name values that appear to be spamming the map and are clearly vandalism. Here are some examples where we took action to remove the name value of the features.

Example 1 Example 1

Example 2 Example 2

2) Our detection technique also flags Pokemon related edits, particularly Pokemon terms as the name on map features. The action we took was also removing the feature name.

Example 3 Example 3

3) When profane words are present in the map feature name, the feature will be brought to us for profanity removal.

Example 4 Example 4

Example 5 Example 5

Example 6 Example 6


During cleanup, whenever other data issues occur on the map features flagged for vandalism and profanity, we clean up the data issues as well. For example, in Example 6 above, the map feature involved a dragged node causing the road crossing water features improperly. After fixing the profane name, we submitted a follow-up changeset to address the dragged node.

Example 7 Example 7



Relation Repair

Another enhancement we’ve made is detection on broken relations. Using the enhancement, we found and repaired over 4,100 relations on OSM. These relations range from simple groups of buildings to complex water features, as below. And often times, the complicated ones had existed on OSM for years due to the difficulty of detection and repair.

Example 8 Example 8 Before Fix Example 8 After Fix



Atlas Checks Fixes

Other than using improved internal techniques to clean up vandalism and repair relations, we apply an open-source validation tool called Atlas Checks, to detect various data issues, including line crossing waterbodies improperly, name gaps in connected roads, missing relation type, overlapping ways, etc. By utilizing Atlas Checks to detect data issues, and its ability to streamline the detection results to MapRoulette for more systematic fixing, we have submitted more than 7.9K changesets to repair over 122K OSM features.

Total_atlas_checks_issues


Atlas Checks Fixes include a large variety of data issues. Here are a couple examples of what we’ve fixed:

Example 9

The water feature was flagged due to overlapping the adjacent highway. We adjusted the water body to avoid the overlap. Example 9

Example 10

The highway crossed the river without a bridge, so we split the highway segment out and added proper bridge tags. Example 10

Example 11

Name value didn’t continue in the middle road segment while the two segments connected on either end included the same road name. We added the missing name on the segment in between as it is part of a continuous highway. Example 11


Since we use MapRoulette as a platform to systematically fix Atlas Checks outputs, it allows us to share these flagged features beyond just our internal team. Most recently, we shared a couple challenges to the OSM community in April and May.

If anyone is interested in supporting these Atlas Check fixes on MapRoulette, please feel free to message us on Slack or email us at osm@fb.com.

Discussion

Comment from SomeoneElse on 8 July 2021 at 10:42

What actually determines “profanity” here? Would e.g. the loc_name on https://www.openstreetmap.org/way/854436697 be a problem? That loc_name is easily verified. What steps are taken to ensure that valid names considered “profane” in some dialects of English aren’t removed?

Comment from philippec on 8 July 2021 at 11:35

I am not allowed to join the Facebook group of my heimat. Probably because I was too eager to share the latest news about our local saint.

Comment from saul-goodman on 8 July 2021 at 12:34

Thank you for this long article! It is a long way to a perfect OSM, but I think we are on the right way.

Comment from GrizzlyTTU on 9 July 2021 at 14:30

Why are Pokemon words automatically removed? What if in example three, the road is - actually - colloquially named pikachu pass? Why does this automatically constitute vandalism?

Comment from philippec on 9 July 2021 at 14:33

Because of artificial stupidity.

Comment from GrizzlyTTU on 9 July 2021 at 14:47

I’d also like to point out that Pikachu is a Brazilian last name and you might be erasing someone’s legacy. I do not think example 3 is vandalism. I think this is a tagging mistake of a new mapper. I also think this is indicative of a larger policy/procedure problem. What if a developer names all the roads in a subdivision after pokemon?

This screenshot is taken out of context, there are several other odd names around this area. The road runs along a rodeo arena and I’ve heard some pretty weird names from cowboys.

Fortunately, this edit is in my hometown and I’ve determined who owns that property using the tax database. This is a small town, so everyone knows someone, and I’ll contact the property owner. This afternoon I’m also driving over there to verify and document. Surely I will find someone there to ask about it. I’ll find out if this road is colloquially known as Pikachu Pass and will correct it with informal_name=

Would it be appropriate to keep it name=pikachu pass if there are signs indicating this?

Comment from SomeoneElse on 9 July 2021 at 15:19

Would it be appropriate to keep it name=pikachu pass if there are signs indicating this?

Yes. Also, if a road is “colloquially known as Pikachu Pass” then “loc_name=Pikachu Pass” is entirely appropriate too.

Some real-world names and tag combinations do appear unlikely at first glance - http://osm.mapki.com/history/way.php?id=128265146 is or was apparently a nice steak house in Burkina Faso (see https://www.openstreetmap.org/changeset/16601166), but has been “corrected” to be a burger joint on a couple of occasions. People making corrections like this aren’t helping to improve the quality of the data in OSM.

It’s also true that “new mapper errors” aren’t in any sense vandalism. The appropriate course of action there is to educate new mappers about how names are used in OSM, and what they can do to see the data that they are interested in (perhaps use a different map or app if the one that they are using does not show what they have just added).

All that said, it should be relatively easy for everyone to follow this activity - as required by the OEG https://wiki.openstreetmap.org/wiki/Organised_Editing/Activities/Facebook#Atlas_Checks says that “ Atlas Checks detections are given the following hashtag: #AtlasChecks”.

Comment from GrizzlyTTU on 9 July 2021 at 19:37

Well, I have to eat humble pie on this one. I drove out and spoke to the land owner and had an excellent time. We talked at length, I walked along this very way and spread the OSM gospel.

Currently OSM indicates that the driveway to the north of the cotton field connects to “Pikachu Pass”; which then continues through the field. Which is not the case. The driveway ends at a garage visible in the images and a fence surrounds the property.

The owner told me this: At one time the entire property was owned by one family but then later divided. Earlier in the history there was a “turn row” that ran through there. A turn row is a strip of land adjacent to row crops to facilitate maneuvering a tractor. In other regions they are called headlands or endrows. We have been mapping them as service roads.

However, given the new property lines and the farmer’s methodology, this turn row longer exists. The cotton rows butt up right against the fence line. There is not a navigable route there anymore. Even though the land is currently being used for ag purposes, there is some legacy soil compaction that is visible in the images. This may have been why the student mapped it as a road. We still do not know why it was labeled Pikachu Pass and we have contacted the student.

Shortly, I am deleting the way. Is this the appropriate action?

Lastly, I’d like to apologize. My first priority is to our students and our volunteers. I have to give them the benefit of the doubt. He has otherwise been an excellent student. I also apologize that this happened during our mapathon. We will do a better job instructing in the future and be sure to emphasize data quality. Thanks.

Comment from Yunzhi on 9 July 2021 at 20:10

What actually determines “profanity” here? Would e.g. the loc_name on https://www.openstreetmap.org/way/854436697 be a problem? That loc_name is easily verified. What steps are taken to ensure that valid names considered “profane” in some dialects of English aren’t removed?

Thank you for you feedback! To detect profanity, we collected and evaluated a list of profanity words and used this list as the library to train our detection model. The detection is just a starting point. It helps us to identify potential candidates of profanity and vandalism. Our mapping team investigated each candidate to decide what action should be taken.

Comment from Yunzhi on 9 July 2021 at 20:12

Yes. Also, if a road is “colloquially known as Pikachu Pass” then “loc_name=Pikachu Pass” is entirely appropriate too.

Some real-world names and tag combinations do appear unlikely at first glance - http://osm.mapki.com/history/way.php?id=128265146 is or was apparently a nice steak house in Burkina Faso (see https://www.openstreetmap.org/changeset/16601166), but has been “corrected” to be a burger joint on a couple of occasions. People making corrections like this aren’t helping to improve the quality of the data in OSM.

It’s also true that “new mapper errors” aren’t in any sense vandalism. The appropriate course of action there is to educate new mappers about how names are used in OSM, and what they can do to see the data that they are interested in (perhaps use a different map or app if the one that they are using does not show what they have just added).

All that said, it should be relatively easy for everyone to follow this activity - as required by the OEG https://wiki.openstreetmap.org/wiki/Organised_Editing/Activities/Facebook#Atlas_Checks says that “ Atlas Checks detections are given the following hashtag: #AtlasChecks”.

We totally agree there are names that look strange but appear to be real names of the map features. Before taking action on the flagged names, our mapping team will investigate each of them to decide the next step. Activities from our mapping team can be followed by tracking the hashtags listed in our OSM wiki

Comment from Yunzhi on 9 July 2021 at 20:43

@GrizzlyTTU

Thank you for sharing your feedbacks and taking the ground survey to help verify the road name. Further contributions from you and your team are very welcome.

Comment from mapmeld on 9 July 2021 at 23:24

This is great to see! Thanks for doing this and documenting the process

I wanted to promote another method for community and vandalism detection - the Unicode blocks used in names. https://www.openstreetmap.org/user/mapmeld/diary/48032

Comment from SomeoneElse on 10 July 2021 at 18:02

Incidentally, I’ve just commented on an “#AtlasChecks” changeset at https://www.openstreetmap.org/changeset/107663718 . It looks there as if a fake name added by a former contributor has accidentally been reapplied by Facebook, perhaps because the comparison data set wasn’t up to date?

Comment from benoitdd on 12 July 2021 at 09:05

Any idea of why France has so much more issues in the Atlas Check than the rest of the world?

Comment from H@mlet on 12 July 2021 at 13:15

@benoitdd I was wondering also. Maybe the import of buildings from cadastre, not always clean polygons…

I would be really curious to explore the data, but I couldn’t find it.

Comment from skquinn on 12 July 2021 at 15:52

  1. Any way of getting a closeup of that first map around Houston, Texas? I see several yellow dots and a red dot around where Houston would be and I’m curious as to exactly where these were.
  2. What about certain features that actually have a profane name in reality? (Examples: one of the curves on the Dalton Highway is actually called Oh Shit Corner, and I’m pretty sure there’s an actual Shit Creek signed somewhere.)

Comment from Yunzhi on 13 July 2021 at 00:08

Any idea of why France has so much more issues in the Atlas Check than the rest of the world?

Thanks for asking! This is a great question. Issues in France are mostly flagged by ConcerningAngleBuildingCheck, which attempts to identify buildings that need to be squared. We haven’t made any edits using this check yet, as we are aware that this check may not be applicable in all countries. Developments are still ongoing for certain checks and we are always working towards improving accuracy.

Comment from stevage on 15 July 2021 at 23:35

I’m curious about example 10. First, what imagery is that? It doesn’t look like anything I could find in iD.

Second - how confident are you really that there is a bridge there? There’s at least three ways that a waterway and a road can cross each other: bridge, culvert, ford.

It’s not totally obvious to me from that image that there is a bridge here - could easily be a culvert, as indeed there is a culvert a very short distance northeast. An outside chance of a ford - certainly from the image sources in iD there’s nothing to indicate that that isn’t the case.

IMHO, a “waterway crosses road without bridge tag” problem is not really a problem in itself. It simply indicates there is some missing information, and isn’t worth running the risk of introducing erroneous information with false confidence.

Log in to leave a comment