Facebook is releasing a complete, downloadable preview of OpenStreetMap data we plan to start using in a number of our public maps.
đ„ Download Daylight Map Distribution right now in OpenStreetMap PBF format: đŠ planet-v0.1.osm.pbf (42GB).
At Facebook, we use maps to let our users find friends, businesses, groups and much more. OpenStreetMap (OSM), the open source wiki map, has a substantial global footprint of map data built and maintained by a dedicated community of global mappers making OSM a natural choice for Facebook.
Every day, OSM receives millions of contributions from the community. Some of these contributions may have intentional and unintentional edits that are incompatible with our use cases. Our mapping teams work to scrub these contributions for consistency and quality. In the course of this work, we also build additional tools and technologies on top of OSM.
OSM is a complex data product. Many tools, services, and companies have been created to make it full-featured. Weâve always developed our OSM-related tools with the hope that our approach to keeping maps current and accurate for our own use cases may also benefit others in the OSM community. To that end, weâre pleased to announce the release of the Daylight Map Distribution, one of our internal OSM datasets scrubbed to meet the quality standards of our wide-ranging products.
Whatâs Included in the Daylight Map Distribution:
- This planet file is composed of 100% OSM data, released under the terms of the Open Database License
- Weâve checked public OSM changes contained in the distribution, and allowed only those which have been validated to contain no malicious vandalism to prevent it from being shown to our users in our display maps
- This is currently a one-time release, and weâre looking for feedback from the community to decide on a useful cadence for future releases
- We publish the data in PBF format, a common binary format universally supported by OSM tools and already used for planet-scale file distribution
How we use OSM
We use maps made with OSM across Facebook to show people, events, and places:
Intentional, and possibly malicious edits range from headline-grabbing hate speech to outdated names. Some examples include:
- In 2018 New York Cityâs name was very briefly replaced by a racist epithet
- We found and corrected the real name of Ohioâs Shelby Hills Early Childhood Center, which previously contained the word âRetardedâ from an earlier import from GNIS
Other bad edits include geometry errors and small instances of grafitti. Here are some samples of minor and unintentional bad edits:
- We held back an edit to a portion of the Euphrates River multipolygon which damaged its appearance on the map; at the same time we fixed the relation in OSM for all users
- Small-scale bad edits we find include cartoon characters like these two critters
- Large-scale bad edits include this recent flooding of large portions of South America on the Humanitarian map style, an instance of a particularly large and complex multipolygon relation
OSM forms a critical part of how our users interact with the world around them and our hope is that this release will make it easier for others to benefit from our work ensuring that itâs appropriate for display and free of vandalism. Through our use of OSM, weâve encountered a variety of issues and inconsistencies and weâve included fixes in our release of the Daylight Distribution. We also contribute these fixes back to OSM for the benefit of the larger community.
Working in Open Source
Our approach toward creating the Daylight Map Distribution was inspired by the success of the Linux operating system: starting with a pair of experts-only floppy disks in early 1991, user demand along with a liberal software license led to an explosion of âdistros,â curated collections of software that could be readily installed by casual users. The first Linux distribution was created less than a year later at the Manchester Computing Centre in February 1992. Today there are hundreds of distros including major products like Red Hat and Ubuntu. Distributors optimize for different uses making it easy and safe to use Linux on servers, laptops, phones, tablets, hardware hacking platforms, virtual machines, distributed systems, and embedded devices.
Consistent with the spirit of OSM, it is our hope that the Daylight Map Distribution (and subsequent iterations) will inspire individuals and companies to release their own datasets under open data licenses as well.
With the Daylight Map Distribution, we also hope to showcase all that is possible with a stable, efficient community-drive mapping effort. Open source is by its nature inclusive and welcoming to all. No contribution is too small or too large and weâre proud to stand together with every OSM contributor as we work toward shared goals of improving OSM and mapping the world.
How To Reach The Team
If you have any questions about this data distribution, we have created a #daylightdistro_feedback Slack channel in OSM US. Members of the team will be there periodically to answer questions. You can also email the team at osm@fb.com.
Learn more about the technology behind our process from our engineering team:
- MaRS: How Facebook keeps maps current and accurate post on Facebookâs Engineering blog, Sep 30, 2019
- âKeepinâ it fresh (and good)!â - Continuous Ingestion of OSM Data at Facebook presentation at OSM State Of The Map US conference, Sep 8, 2019
This release is just a sneak peek preview. We plan to start using this version of the data in our public maps soon, but you can start using it today. Download Daylight Map Distribution right now in OpenStreetMap PBF format:
- đŠ planet-v0.1.osm.pbf (42GB) â Complete Daylight v0.1 in OSM PBF format
- đ„Ą planet-2020-03-06_v0.1.osc.bz2 (4.8GB) â changes from OSM.org Planet as of 2020-03-06 to Daylight v0.1
If youâre interested in engineering and other roles working on OpenStreetMap at Facebook, get in touch!
Comment from iandees on 10 March 2020 at 00:16
Thanks for sharing this, Mike! Great to see how Facebook is catching harmful edits to OSM, too.
Comment from M!dgard on 10 March 2020 at 00:26
Good job on achieving compliance with section 4.4 of the ODbL!
Comment from LucGommans on 10 March 2020 at 00:44
This is a convincing argument for the Daylight distribution (Linux distros are definitely a good thing) but the difference is that, for Linux distros, it is clear what the rules are. Debian has clear rules for how to include things; Ubuntu builds on Debian and adds/removes some things they like/donât like; Linux Mint builds again on Ubuntu and removes things like tracking that is (was?) present in Ubuntu (plus some other changes). Other distros can take the code and/or inclusion process from Debian and improve it.
I donât know what this Daylight OSM fork includes or doesnât include, I guess weâll have to look at the diff manually and try to reverse engineer the rules if we want to build on Facebookâs work?
Clicking through to the âlearn infoâ posts, they talk about a few high level things like the LoCha algorithm it uses, but there is no set of rules that it uses. The first one boasts âEnhanced machine-augmented automatic review (Via rules + algorithmic checks + ML)â and 90% automatic approval rates, but does not share much that would help anyone build on this work instead of starting over from scratch.
Comment from migurski on 10 March 2020 at 02:18
Thanks for your feedback, Luc! For the time being, the data is being provided âas-isâ since itâs a snapshot of a hybrid human/machine process with a lot of individual judgement calls. When we see that thereâs demand for Daylight we may decide to release it on a regular basis and further open up its process.
Comment from vtcraghead on 10 March 2020 at 02:41
Is this distro say all processed with an eye toward routing?
Comment from vtcraghead on 10 March 2020 at 02:42
sigh. â. . . AT all processed . . .â
Comment from migurski on 10 March 2020 at 02:43
Hi Bill! We havenât done any routing-specific checks for this distro. Could you say more about what youâre looking for?
Comment from richlv on 10 March 2020 at 12:48
In the past, there have been projects that check routing between known destinations - for example, Berlin -> Paris. If the routed distance differs from the expected one by more than some threshold (or even fails completely), it is flagged for review.
Comment from vtcraghead on 10 March 2020 at 15:22
Iâm specifically wondering how well this plays with OSRM - if thereâs any pattern to the held-back edits that might result in discontinuities or network disruption.
Comment from Finbar1 on 10 March 2020 at 16:16
I would mostly be interested in the building footprints data, especially in Ireland⊠I know from using previous downloads of OSM that the accuracy is fairly low. Is there plans to improve the quality of these features?
Comment from mikelmaron on 10 March 2020 at 20:10
Thanks Mike and Facebook for doing this. Itâs great to have this insight out and available. Thereâs a good tradition of downstream data processing and redistribution in the community (you could call them packages I supposed) â from GeoFabrikâs regional and country downloads, to OSMQATiles, etc.
In this case (and I focused on this when we spoke), Iâm not sure that the most valuable thing to distribute is what made it through Facebook filters, but rather what didnât make it through and why. That insight is valuable to identify problems that need fixing on a faster basis, notify local communities and other editors, and to build up a corpus of understanding of what problematic edits in OSM look like.
The most actionable way to do this distribution will be through OSMCha. Through the OSMCha API, you can flag changesets/features with reasons, and can be set up so that any reason tag by Facebook has a âFacebook:â prefix.
This is what Mapbox has set up. The Mapbox Streets Review team looks at edits every day, and problems are flagged and surfaced in OSMCha. You can see all of this with this OSMCha filter. Youâll see the most recent flag as about 3 days ago â thatâs the typical time between OSM edit and review / publishing in Mapbox Streets.
Adding in Facebook flagged problems to OSMCha would provide even stronger signal of problems, and hope to explore implementing it with you all.
Comment from migurski on 10 March 2020 at 22:21
Thatâs a great suggestion Mikel. Iâll try to learn more about what weâve done and planned with OSMCha so far.
Comment from migurski on 11 March 2020 at 19:23
I wanted to summarize some questions and responses from other channels here.
We are unsure yet about the public release schedule we want to commit to because a lot of it depends on community feedback.
We do correct OSM upstream for the errors that we find. When our human review process catches a map error, we do two things:
Probably the most significant source of that difference is that we completely ignore & exclude tags that arenât relevant for our display map needs.
Not a fork. But friendly, yes. Weâve observed that many companies using OSM are uneasy about the potential for bad edits. Oversight from the community fixes most issues in time, where Daylight can pick up just the good fix and not the original error.
Weâre discussing internally how we might go about doing this.
Not at this time, but the PBF release format makes Daylight compatible with major OSM tools that can be used to generate regional extracts.
Comment from richlv on 12 March 2020 at 06:21
Finbar1, I would guess that the focus of this project is to focus on problematic changesets/edits, not data quality as such.
Mikel, thatâs a great insight, on publishing âwhat didnât make it throughâ. Flagging problems (automatic or semi-automatic) frees up the contributor time and allows to fix the problems sooner.
Comment from migurski on 16 July 2020 at 18:13
Weâve published an update to Daylight, more information in this diary entry.
Comment from Geonick on 13 August 2020 at 11:41
Mikal, I think you missed something in your summmary in the comment above from 11 March 2020 at 19:23.
Pls. let like to me come back to Mikelâs suggestion in comment https://www.openstreetmap.org/user/migurski/diary/392416#comment46772 to do feedback detected possible OSM issues as follows: âthrough the OSMCha API, you can flag changesets/features with reasons, and can be set up so that any reason tag by Facebook has a âFacebook:â prefix.â as Mapbox doesâ ⊠as can bee seen with this OSMCha filter: https://osmcha.org/?aoi=083b147b-a72c-4026-9db5-b70761a6795c .
Iâve opened an issue here https://github.com/facebookmicrosites/Open-Mapping-At-Facebook/issues/12 . Iâd suggest we continue discussion this action over there - if needed :-)
Comment from StefanoCudini on 18 August 2020 at 11:46
hi Michal!
thank you for this accurate article and for the resources made available
would it be possible to have a url where to find the MD5 checksum of the file?
many tnks Stefano Cudini opengeo.tech
Comment from ElliottPlack on 24 November 2020 at 20:14
Esri has a new basemap out based on daylight. Great seeing this collaboration, and more OSM in the GIS world: https://www.esri.com/arcgis-blog/products/arcgis-living-atlas/mapping/dawn-of-osm-daylight-in-arcgis/