OpenStreetMap

Announcing Daylight Map Distribution

Posted by migurski on 10 March 2020 in English (English)

Facebook is releasing a complete, downloadable preview of OpenStreetMap data we plan to start using in a number of our public maps.

📥 Download Daylight Map Distribution right now in OpenStreetMap PBF format: 📦 planet-v0.1.osm.pbf (42GB).

At Facebook, we use maps to let our users find friends, businesses, groups and much more. OpenStreetMap (OSM), the open source wiki map, has a substantial global footprint of map data built and maintained by a dedicated community of global mappers making OSM a natural choice for Facebook.

Every day, OSM receives millions of contributions from the community. Some of these contributions may have intentional and unintentional edits that are incompatible with our use cases. Our mapping teams work to scrub these contributions for consistency and quality. In the course of this work, we also build additional tools and technologies on top of OSM.

OSM is a complex data product. Many tools, services, and companies have been created to make it full-featured. We’ve always developed our OSM-related tools with the hope that our approach to keeping maps current and accurate for our own use cases may also benefit others in the OSM community. To that end, we’re pleased to announce the release of the Daylight Map Distribution, one of our internal OSM datasets scrubbed to meet the quality standards of our wide-ranging products.

Daylight Map Distribution

What’s Included in the Daylight Map Distribution:

  • This planet file is composed of 100% OSM data, released under the terms of the Open Database License
  • We’ve checked public OSM changes contained in the distribution, and allowed only those which have been validated to contain no malicious vandalism to prevent it from being shown to our users in our display maps
  • This is currently a one-time release, and we’re looking for feedback from the community to decide on a useful cadence for future releases
  • We publish the data in PBF format, a common binary format universally supported by OSM tools and already used for planet-scale file distribution

How we use OSM

We use maps made with OSM across Facebook to show people, events, and places:

map examples

Intentional, and possibly malicious edits range from headline-grabbing hate speech to outdated names. Some examples include:

bad edit examples

Other bad edits include geometry errors and small instances of grafitti. Here are some samples of minor and unintentional bad edits:

bad edit examples

bad edit examples

OSM forms a critical part of how our users interact with the world around them and our hope is that this release will make it easier for others to benefit from our work ensuring that it’s appropriate for display and free of vandalism. Through our use of OSM, we’ve encountered a variety of issues and inconsistencies and we’ve included fixes in our release of the Daylight Distribution. We also contribute these fixes back to OSM for the benefit of the larger community.

Working in Open Source

Our approach toward creating the Daylight Map Distribution was inspired by the success of the Linux operating system: starting with a pair of experts-only floppy disks in early 1991, user demand along with a liberal software license led to an explosion of ”distros,” curated collections of software that could be readily installed by casual users. The first Linux distribution was created less than a year later at the Manchester Computing Centre in February 1992. Today there are hundreds of distros including major products like Red Hat and Ubuntu. Distributors optimize for different uses making it easy and safe to use Linux on servers, laptops, phones, tablets, hardware hacking platforms, virtual machines, distributed systems, and embedded devices.

Consistent with the spirit of OSM, it is our hope that the Daylight Map Distribution (and subsequent iterations) will inspire individuals and companies to release their own datasets under open data licenses as well.

With the Daylight Map Distribution, we also hope to showcase all that is possible with a stable, efficient community-drive mapping effort. Open source is by its nature inclusive and welcoming to all. No contribution is too small or too large and we’re proud to stand together with every OSM contributor as we work toward shared goals of improving OSM and mapping the world.

How To Reach The Team

If you have any questions about this data distribution, we have created a #daylightdistro_feedback Slack channel in OSM US. Members of the team will be there periodically to answer questions. You can also email the team at osm@fb.com.

Learn more about the technology behind our process from our engineering team:

This release is just a sneak peek preview. We plan to start using this version of the data in our public maps soon, but you can start using it today. Download Daylight Map Distribution right now in OpenStreetMap PBF format:

If you’re interested in engineering and other roles working on OpenStreetMap at Facebook, get in touch!

Location: Belle Haven, San Mateo County, California, 94025-1246, United States of America

Comment from iandees on 10 March 2020 at 00:16

Thanks for sharing this, Mike! Great to see how Facebook is catching harmful edits to OSM, too.

Comment from M!dgard on 10 March 2020 at 00:26

Good job on achieving compliance with section 4.4 of the ODbL!

Comment from LucGommans on 10 March 2020 at 00:44

Our approach toward creating the Daylight Map Distribution was inspired by the success of [Linux]: […] user demand along with a liberal software license led to an explosion of ”distros,” curated collections of software that could be readily installed by casual users.

This is a convincing argument for the Daylight distribution (Linux distros are definitely a good thing) but the difference is that, for Linux distros, it is clear what the rules are. Debian has clear rules for how to include things; Ubuntu builds on Debian and adds/removes some things they like/don’t like; Linux Mint builds again on Ubuntu and removes things like tracking that is (was?) present in Ubuntu (plus some other changes). Other distros can take the code and/or inclusion process from Debian and improve it.

I don’t know what this Daylight OSM fork includes or doesn’t include, I guess we’ll have to look at the diff manually and try to reverse engineer the rules if we want to build on Facebook’s work?

Clicking through to the “learn info” posts, they talk about a few high level things like the LoCha algorithm it uses, but there is no set of rules that it uses. The first one boasts “Enhanced machine-augmented automatic review (Via rules + algorithmic checks + ML)” and 90% automatic approval rates, but does not share much that would help anyone build on this work instead of starting over from scratch.

Comment from migurski on 10 March 2020 at 02:18

Thanks for your feedback, Luc! For the time being, the data is being provided “as-is” since it’s a snapshot of a hybrid human/machine process with a lot of individual judgement calls. When we see that there’s demand for Daylight we may decide to release it on a regular basis and further open up its process.

Comment from vtcraghead on 10 March 2020 at 02:41

Is this distro say all processed with an eye toward routing?

Comment from vtcraghead on 10 March 2020 at 02:42

sigh. “. . . AT all processed . . .”

Comment from migurski on 10 March 2020 at 02:43

Hi Bill! We haven’t done any routing-specific checks for this distro. Could you say more about what you’re looking for?

Comment from richlv on 10 March 2020 at 12:48

In the past, there have been projects that check routing between known destinations - for example, Berlin -> Paris. If the routed distance differs from the expected one by more than some threshold (or even fails completely), it is flagged for review.

Comment from vtcraghead on 10 March 2020 at 15:22

I’m specifically wondering how well this plays with OSRM - if there’s any pattern to the held-back edits that might result in discontinuities or network disruption.

Comment from Finbar1 on 10 March 2020 at 16:16

I would mostly be interested in the building footprints data, especially in Ireland… I know from using previous downloads of OSM that the accuracy is fairly low. Is there plans to improve the quality of these features?

Comment from mikelmaron on 10 March 2020 at 20:10

Thanks Mike and Facebook for doing this. It’s great to have this insight out and available. There’s a good tradition of downstream data processing and redistribution in the community (you could call them packages I supposed) – from GeoFabrik’s regional and country downloads, to OSMQATiles, etc.

In this case (and I focused on this when we spoke), I’m not sure that the most valuable thing to distribute is what made it through Facebook filters, but rather what didn’t make it through and why. That insight is valuable to identify problems that need fixing on a faster basis, notify local communities and other editors, and to build up a corpus of understanding of what problematic edits in OSM look like.

The most actionable way to do this distribution will be through OSMCha. Through the OSMCha API, you can flag changesets/features with reasons, and can be set up so that any reason tag by Facebook has a “Facebook:” prefix.

This is what Mapbox has set up. The Mapbox Streets Review team looks at edits every day, and problems are flagged and surfaced in OSMCha. You can see all of this with this OSMCha filter. You’ll see the most recent flag as about 3 days ago – that’s the typical time between OSM edit and review / publishing in Mapbox Streets.

Adding in Facebook flagged problems to OSMCha would provide even stronger signal of problems, and hope to explore implementing it with you all.

Comment from migurski on 10 March 2020 at 22:21

That’s a great suggestion Mikel. I’ll try to learn more about what we’ve done and planned with OSMCha so far.

Comment from migurski on 11 March 2020 at 19:23

I wanted to summarize some questions and responses from other channels here.

Is Facebook planning to create new planet files like this frequently for internal use?

We are unsure yet about the public release schedule we want to commit to because a lot of it depends on community feedback.

Is Facebook fixing data errors on OSM?

We do correct OSM upstream for the errors that we find. When our human review process catches a map error, we do two things:

  1. Hold it back from release to our display maps
  2. Fix the error upstream in OSM.org

What’s the breakdown of the 8GB size difference between Daylight planet and OSM planet?

Probably the most significant source of that difference is that we completely ignore & exclude tags that aren’t relevant for our display map needs.

So DLD is a sort of friendly fork for FB and others to ‘read’ from and where ‘writing’ is selectively pulling from upstream?

Not a fork. But friendly, yes. We’ve observed that many companies using OSM are uneasy about the potential for bad edits. Oversight from the community fixes most issues in time, where Daylight can pick up just the good fix and not the original error.

Will information about rejected changes be released to the community?

We’re discussing internally how we might go about doing this.

Is there any plan to publish sub-region data?

Not at this time, but the PBF release format makes Daylight compatible with major OSM tools that can be used to generate regional extracts.

Comment from richlv on 12 March 2020 at 06:21

Finbar1, I would guess that the focus of this project is to focus on problematic changesets/edits, not data quality as such.

Mikel, that’s a great insight, on publishing “what didn’t make it through”. Flagging problems (automatic or semi-automatic) frees up the contributor time and allows to fix the problems sooner.

Login to leave a comment