Recent diary entries
As of June, New York City buildings and addresses have been fully imported to OpenStreetMap. While we are tackling remaining cleanup tasks I wanted to share a full recap of the effort. I am very happy with the overall result. There are lessons to be learned here from what went well but also where we could have done better - read on for the details.
More than 20 people - volunteers and members of the Mapbox team - spent more than 1,500 hours writing proposals, discussing, programming, uploading, processing and reviewing. Between September 2013 and June 2014 we imported 1 million buildings and over 900,000 addresses. We fixed over 5,000 unrelated map issues along the way.
Here are screenshots of the resulting work:
Building coverage on Manhattan island, the southern tip of the Bronx to the northwest and Wards island to the right.
JFK airport buildings in Queens, bordering on the Hamilton Beach neighborhood to the left and South Ozone Park to the north.
Coverage around Battery Park and Wall Street in Manhattan. This is an area that already had many buildings. We filled in the gaps and replaced buildings where the New York City data set was clearly better.
We imported over 900,000 addresses. Here is an example of the Park Slope neighborhood in Brooklyn.
Buildings contain height information and render nicely as seen here on this example of downtown Brooklyn on Fmap.
The import covers all of New York City's five boroughs
This is a full writeup sharing my experience with the New York City import in the hope that there is one or the other valuable lesson, good idea, or line of code for you to walk away with. Note that this post is very specific to the work in New York City. If you're planning to do an import, make sure to check out the Import Guidelines for a more universal checklist of how to go about imports.
If you're looking for the 30 seconds version, I'd summarize my take aways like this:
- Importing is a lot of work, make sure you have the time to commit.
- Be prepared to continuously improve your conversion scripts and already uploaded data throughout the import.
- Importing is a skill. It looks easy at first, but everyone involved uploading will need proper support, advanced knowledge of mapping practices and data validation by peers.
- Involve community where possible, clear and frequent communication is clutch.
- Invest in your tools
Read on for the deep dive.
OpenStreetMap as a collaboration space for citizens and government
Using New York City's data for OpenStreetMap became possible thanks to the then-mayor Michael Bloomberg's open data policy. Local Law 11 of 2012, releases all New York City government data "without any registration requirement, license requirement or restrictions on their use" (23-502 d). This effectively puts the data in the public domain, making it compatible with OpenStreetMap's contributor terms.
Both, address point data and building data fall under this law and are available for download on New York City's open data web site:
The way we used this data in OpenStreetMap is an illustration of how Bloomberg's plan to stimulate the economy with open data is starting to pay off. This data in OpenStreetMap is now benefiting everyone using OpenStreetMap and this includes the New York City based startup Foursquare which is using OpenStreetMap data on its Mapbox powered maps.
But the relationship between OpenStreetMap and New York City should be ideally a two way street. How can the creator and maintainer of the building and address datasets - New York City's GIS department - benefit directly from their work being imported in OpenStreetMap? The vision of edits in OpenStreetMap directly helping improve a crucial government dataset is very promising. OpenStreetMap is a unique data collaboration platform while datasets like building or address catalogs are incredibly hard to maintain - even for a large municipal government like New York's. How can government become a part of OpenStreetMap?
OpenStreetMap's share alike license means that OpenStreetMap data can't be taken over directly into New York City public domain datasets but we can use OpenStreetMap to find out where changes happened. We set up a daily change feed flagging modifications to buildings and addresses to subscribers. Here's a copy of a change notification email how New York City GIS receives it every day:
Daily change notifications from OpenStreetMap, flagging building and address changes to New York City government.
The notification contains a list of relevant changesets from the previous day with a link to each modified building and address. We are right now assessing the utility of these emails. Another way of leveraging OpenStreetMap as a change signal would be to periodically extract all building and address data and identify all changes in a certain time frame at once.
All code powering the change feed is available as open source on Github. If you'd like to receive the New York City change feed notifications, please let me know. Happy to subscribe you.
To import New York City data we had to convert it to OpenStreetMap format first and cut it into byte size chunks so we could review and import it manually, piece by piece. Once it was imported, a different person than the original importer would validate the data. This means reviewing it for errors and cleaning it up where needed.
Each participant would set up their workspace according to documentation we provided on Github. In the same document we laid out the actual import procedure. Some of the key items of the import procedure were:
- Use a separate import account
- Run full JOSM validation, fix all conflicts with existing data
- But also fix all existing unrelated issues in area
- Spot check data - for instance, do street names line up?
- Merge POIs where appropriate
- In case of duplicate data, keep the best data if there is a clear difference. In case of any doubt, keep the local data.
- Add a note where a local mapper could solve a problem
As we imported, we ran into a series of recurring issues that we shared in a common issues guide - a useful resource for training new mappers and agreeing on fixes for unclear situations.
Community import or not?
From the beginning, the import was planned as a community import. There is no standing definition of this practice, but the rough idea is that uploads to the map would be done predominantly by members of local community familiar with the areas uploaded. Once started into the import, we quickly ran into a series of issues.
For Mapbox data team members participating in the import full time it was very easy to outpace local volunteers by a huge factor. In addition, I underestimated the complexity of the actual review and upload work. While not hard, there was a certain learning curve which meant that every new individual joining required significant training and support to get started - which meant plain and simple time that someone had to spend. Add to this that the individual time commitment is huge. I estimate we spent about 1,500 hours among everyone involved - and this is on the conservative side. Assuming 20 people work on the import, each one of them would look at 75 hours on this project. Very few people spend this much time on OpenStreetMap in a year.
The pace of uploads turned out to be key friction point. At the same time a series of data quality issues arose. This is why a couple of months into the import the loosely formed group around the project including community members and myself decided to pause the import and when we restarted a month later, slow it down and stop billing it as a community import. This would allow everyone to participate better and it would set expectations straight as to who was doing the uploading work. I think this adjustment was a good one. Overall it took us 10 months to get the job done - longer than I thought but still a pace that I was comfortable with to commit help finish the job. In the end a vast majority of uploads, validations and programmatic updates were done by the Mapbox team and I'm glad we had the opportunity to contribute.
Still, community involvement was clutch. The incredible input everyone gave, the many reviews, advice and personal time people invested was crucial to make this import a success. Everyone weighing in has helped make the resulting map better.
We dealt with data corruption and conversion script bugs all using Github issues. Over the course of the import, we opened and closed 120 issues flagging suspicious data found in data reviews and sometimes working through protracted problems with New York City's head of GIS directly chiming in and helping interpret data correctly.
Some of the issues we discovered required updates to data we already imported. Once we were into the import even a couple of days, updating existing data manually quickly wasn't an option anymore. This is where automated edits came in, updating OpenStreetMap data programmatically. We captured all scripts for automated edits in the same code repository as the data conversion scripts. Some examples of programmatic updates are:
- We fixed wrong tagging on school buildings where we tagged
- We added ordinal suffixes like "th" in "4th".
- We expanded abbreviations we had overlooked like "Ft" to "Fort".
We prepared this import well and we had good peer reviews on the imports list running up to the first uploads. We could head off many issues before we started importing. But in the end, the amount of issues we encountered after we started was still an unpleasant surprise. Having gained a lot more experience with this import I am sure the next time we can avoid a series of pitfalls - but the need for being able to programmatically update data after it's been uploaded is crucial for a successful import. You simply cannot plan for all eventualities and you need to be prepared to apply fixes as you go.
From this perspective, the next time I would want us to write data integrity tests from the get go. These tests would assert data quality on data before it is uploaded. This would allow us to be much more agile in updating and refactoring conversion scripts as we go.
Another set of tests would assert data quality of already uploaded data. This would help to identify existing systematic problems and catch data issues due to negligent uploads fast.
So far, we have a rudimentary directory with validation scripts we started to build up during the import. There is a real need across the OpenStreetMap community to further develop and share easy to use tools to test and validate data. What if we could reuse the validators available in JOSM from the command line on arbitrary portions of OpenStreetMap data?
To get source data ready for upload, a conversion script would download the data, split it, convert it and store the resulting files in OSM XML format on Amazon S3. We set up a tasking manager job that would expose each file as a task for people to import. To upload a dataset, a mapper would select a task, download OpenStreetMap data and load OSM data. We used the excellent JOSM editor to merge and review data before uploading to OpenStreetMap.
The entire data processing script is captured in a Makefile and can be run from download to upload to Amazon S3 with a single command. In sequence, the processing script would perform the following actions:
- Download and unpack buildings (polygon data in shapefile format)
- Download and unpack addresses (point data in shapefile format)
- Reproject and simplify building geometries
- Reproject addresses
- Split buildings and addresses into byte size chunks
- Merge: Where only a single address is available for a building, merge the address attributes onto the building polygon.
- Convert: Map attributes to OpenStreetMap tags, convert street name formatting and house number formatting and export in osm format
- Put to S3
All code is open source under a permissive BSD license - feel free to lift where convenient.
The conversion script is repeatable with a single command and it is organized in stages: Each significant processing step creates files on disk and can be run separately. All that's needed are the output files of the previous processing stage. Running the entire script would take on the order of several hours on an extra large Amazon EC2 instance. Being able to run steps like the merge stage or the convert stage separately was saving important debugging time. Throughout the import, we wound up reprocessing the data countless times as we fixed issues.
# Download, convert and push to s3 make && ./puts3.sh # Download and expand all files, reproject make download # Chunk address and building files by district make chunks # Generate importable .osm files. # This will populate the osm/ directory with one .osm file per # NYC election district. make osm # Clean up all intermediary files: make clean # Put to s3 ./puts3.sh # For testing it's useful to convert just a single district. # For instance, convert election district 65001: make merged # Will take a while python convert.py merged/buildings-addresses-65001.geojson # Very fast
Reprojecting and simplifying
ogr2ogr -simplify 0.2 -t_srs EPSG:4326 -overwrite buildings/buildings.shp buildings/building_0913.shp
Splitting into byte size chunks
We couldn't upload all data in one go, it had to be cut into byte size chunks for manual review and upload. For splitting up the data we used New York City voting districts. This was an arbitrary choice, it just so happens that New York City voting districts are of a manageable size for manual uploads. There are 5,285 voting districts, the processing script generated an OSM file for manual upload for each one of them. The script
chunk.py uses the great Shapely and Fiona libraries for doing this. It is nicely reusable for any task where you need to split up one geospatial dataset by the polygons of another geospatial dataset.
In OpenStreetMap, addresses tend to be merged onto building polygons where only one address is available for the building. We wanted to follow this convention and thus merged addresses where only one was available onto the corresponding building. The python script
merge.py uses Shapely, Fiona and Rtree to do this. The script also converts data into geojson format - which was extremely useful for debugging as we could inspect them in any text editor. Here is an example output file of the merge stage.
Most of our fixes during the import happened on later stages so we could always work off of the merged files, saving about 50% of the total processing time.
This is where most of the actual conversion is happening - this is also the part of the script that was the most significant time investment. It captures the full complexity of the conversion and handles hairy problems like house number conversion, street name conversion, cleanly merging geometries, generating multipolygons and more. The script
convert.py uses Shapely and lxml for attribute mapping and exporting data in OSM XML format. OSM XML is directly readable by JOSM, so the resulting files of this stage could be opened and directly uploaded to OpenStreetMap with JOSM.
One tricky problem we're solving on this stage is merging T-intersections. OpenStreetMap's data model is unique in that it allows for sharing vertices between polygons. In the picture below, you see a typical T intersection. The node with the arrow is supposed to be part of the two ways describing the corner of one building but also part of the ways describing the straight walls of the other building.
It took us a while into the import to notice unmerged T-intersections. What makes this issue vexing is that OpenStreetMap's native decimal precision is lower than our source data. The result was that data we uploaded to OpenStreetMap looked fine, but once we downloaded it again it came back with truncated precision, moving nodes just far enough to place some within neighboring buildings.
Nodes on T-intersections between buildings need to be part of both buildings.
Our conversion script merges all incidents of T-intersections. This requires truncating decimal point precision to OpenStreetMap's native 7 positions and buffering - the technique to test not only whether a point sits on a line, but whether a point is in the close vicinity of a line. Read up on
convert.py for details.
Pushing to S3 and exposing the data in the tasking manager
For exposing tasks to mappers we used the OSM Tasking Manager - a great tool for coordinating mapping tasks among large groups of individuals. We used a patched version that allows for tasks shaped as arbitrary polygons - instead of the usual squares. Each task polygon pointed to the file we've made available on s3, and the tasking manager exposed two buttons: one for loading OpenStreetMap data into JOSM, the other one for loading the import data into JOSM. We labeled those buttons "JOSM" and ".osm" which doesn't make all too much sense, but hey!
Loading data into JOSM from the tasking manager.
Reusing and the elusive import toolchain
Writing these scripts we avoided overthinking the problem. Creating generalized solutions for these functionalities is hard and we simply didn't have enough data points to do so. Now having gone through this import, I see a couple of opportunities to solidify a toolchain for import:
- Generalize a command line script for splitting data (like a properly abstracted
- Generalize a library for converting Simple Features to the OpenStreetMap data model, including XML export
- Consider using PostGIS - I avoided it intentionally here, but built in spatial operations and indexing is appealing
- Identify a pattern for reusable validation scripts that can be used to assert data quality before and after uploads
Continuously improving the map
Here is the full time line of the import:
- July 2013 Started programming the conversion script
- September 2013 Proposed import on imports list
- September 2013 First test import
- October 2013 New York City community import session
- December 2013 Pause import after multiple issues arose
- February 2014 Restarted import after fixing all critical issues, going at slower upload pace after community feedback
- June 2014 Finalized uploads and tasking manager level validation
We are not done yet. While all data has been imported to OpenStreetMap, there are final cleanup tasks we are tackling as we speak. Help us further improve the map: if you find a building or address related issue on the New York City map, please let us know by filling an issue on Github. As soon as new data is available from New York City, we will also take a look at updating OpenStreetMap where it makes sense.
Huge thanks to all who have helped make this import happen. Through your work reviewing, coding, organizing mapping parties and doing data uploads you have helped make this import better than it would have been without you: Serge Wroclawski, Liz Barry, Eric Brelsford, Toby Murray, Ian Dees, Paul Norman, Frederick Ramm, Chris MacNally, and many others. A special thanks to Colin Reilly from New York City GIS who has helped on many occasions fully understand the source data and find the best decision translating it to OpenStreetMap. A big shout out to my colleagues who've put a ton of work into this endeavour: Ruben Lopez, Edith Quispe, Aaaron Lidman, Matt Greene, and Tom Macwright among others. Say hello if you bump into them on the internet, or maybe at one of the next conferences.
Cheers to making the best map in the world.
We've completed work on the San Francisco building footprint dataset. We added or modified over 150,000 buildings in about 5 months of tracing with a team of three. My colleague Ruben just posted stats on the Mapbox blog. Here's an animation of all changes.
We're updating attribution for OpenStreetMap-based Mapbox maps thanks to feedback on attribution conventions here on the diary and on mailing lists. The new convention on Mapbox maps is to expand attribution by default: collapsed attribution should only be used when attribution becomes unusually long, or screen space is limited. Expect us to roll out these changes over the next couple of weeks, but here is a preview right away.
The entire goal of the Mapbox team's work with OpenStreetMap is to help make OpenStreetMap the best map, everywhere in the world. We will only be able to achieve this as a community and with open data. Linking maps back to OpenStreetMap is at the heart of growing OpenStreetMap by helping turn map consumers into map contributors. Our goal with these new attribution conventions is only to further improve the connection of the many million users who view Mapbox maps every day to OpenStreetMap.
Here are the new attribution recommendations for all Mapbox maps that are based on OpenStreetMap data.
While collapsed attribution wrapped in an info - ⓘ - symbol, works well on small screens, we are now recommending to expand attribution whereever possible. The full attribution line is "© Mapbox © OpenStreetMap" and next to it we recommend an "Improve this map" link leading a user to editing on OpenStreetMap. Another change is that now "© OpenStreetMap" links directly to http://www.openstreetmap.org/copyright, "© Mapbox" continues to link to http://mapbox.com/about/maps listing the full roster of map data we're using including OpenStreetMap.
Recommended attribution on Mapbox maps. Click to explore.
Collapsed for small maps
We're recommending this form of attribution for small slippy maps. Here's an example:
Recommended attribution on small slippy Mapbox maps. Click to explore.
Use these attributions now
Until these attribution recommendations are rolled out on Mapbox.com, here are links to code snippets that already work today:
Updated attribution recommendations for Mapbox maps: http://www.openstreetmap.org/user/lxbarth/diary/21847
Showing how OpenStreetMap is a living map, and making it easy to start mapping is the first step to turn someone from passively looking at a map into improving the map. It's part of spreading the word and building our community. At Mapbox we power OpenStreetMap based maps to hundreds of millions of people, and this gives us a unique opportunity to connect them to OpenStreetMap and turn people from being passive map consumers into active map contributors. Driving contributors to OpenStreetMap is a key goal we pursue not only with attribution but also in our aggressive launch communications around prominent new customers.
Our goal is to feature OpenStreetMap to help grow the community - attribution plays a key role in this.
Attributing OpenStreetMap based Mapbox maps
For the web, at Mapbox we recommend the following two variations for attributing OpenStreetMap:
Attribution in collapsible info control
Same attribution as above but expanded
In both cases
(c) Mapbox (c) OpenStreetMap links to https://www.mapbox.com/about/maps with a full listing of all sources.
Improve this map links to a map feedback page that explains how the map viewed is based on OpenStreetMap and how OpenStreetMap can be improved by anybody. The map feedback page is smart and shows a) the exact map you came from and b) places you into OpenStreetMap exactly where you left the map so you know where to start mapping. It has an option to skip the map-feedback page the next time you click
Improve this map and take you directly to OpenStreetMap.
Map feedback page
Maps made of many sources
Mapbox maps are made up from a multitude of sources, here are some of our main sources:
- Digital Globe
- NASA MODIS, Landsat, SRTM
- USDA NAIP
- l'Institut national de l'information géographique et forestière
- Canadian government
- The National Land Survey of Finland Topographic Database
- Norwegian Mapping Authority
- Ordnance Survey data
- DHM / Terrain
- The National Dynamic Land Cover Dataset
- Custom data added to map
This list is only growing as the source composition of our maps gets more complex. So the string
(c) Mapbox (c) OpenStreetMap is crediting the map engine and design (Mapbox) and one of the most prominent data provider (OpenStreetMap) but it is also functioning as a placeholder that basically says "Attribution". This is why we link this string to https://www.mapbox.com/about/maps that contains the full list of all data. For related reasons, I also typically recommend using the collapsible info control over the expanded string on the map as it allows us in the future to add additional attributions into the map as needed without turning people's maps into NASCARs. This is a good compromise between visibility, legal requirements and the need for screen space to grow.
Mapbox can be used with any kind of map library. So, ultimately we do not have control over a given maps attribution, but if you use Mapbox with our recommended libraries, attribution will show up as explained above, otherwise it is up to the developer to ensure appropriate attribution.
We're working to make this even better, and are planning to improve:
- More granular attribution based on data actually in use on data (right now it's one size fits all and we show this attribution as soon as you use any of streets/satellite/terrain data). A lot of Mapbox maps do not use OpenStreetMap but still want to associate proper attribution.
- Allow third party users to sign into OpenStreetMap with the account they're using on the map (think of signing into OpenStreetMap with your Foursquare account). We need to make it easier to let communities that start using OpenStreetMap become part of our community. This will have huge network effects. This will also take some work on the OSM.org side.
- Map feedback also has an option to submit feedback as email, and have our team run point on edits, fully respecting privacy.
- Share map feedback where it makes sense.
Here are two typical Mapbox powered maps with attribution (click to explore).
Mapbox Outdoors: OpenStreetMap, Ordnance Survey data, l'Institut national de l'information géographique et forestière, NASA SRTM, The National Dynamic Land Cover Dataset plus more.
InfoAmazonia maps: OpenStreetMap, NASA Modis, Landsat, Digital Globe, IBGE, InfoAmazonia.
This weekend, the quarterly US #editathon takes place in 10 US cities - read all about it on the OpenStreetMap US blog.
The #editathons are not just a great excuse to meet up with other OpenStreetMappers to push on projects, but also an opportunity to learn more about OpenStreetMap. In DC we'll be hosting the #editathon in the Mapbox garage. It's going to be great weather so expect some people to go outside and survey too. Read up on the Mapbox blog on how to find the Mapbox garage. Here's a photo from last year's event there:
Hal Hudson from New Scientist wrote a great article on how OpenStreetMap helps Médicins Sans Frontières (MSF) fight Ebola in Guinea:
WHEN doctors working for Médecins Sans Frontières (MSF) arrived in the West African nation of Guinea last month to combat an outbreak of the deadly Ebola haemorrhagic fever, they found themselves working in an information vacuum.
MSF enlisted the help of the Humanitarian OpenStreetMap team (HOT) and within a few days, a huge number of mappers flocked to OpenStreetMap, putting the affected areas on the map. Where existing Bing imagery was not sufficient, Astrium and DigitalGlobe provided fresh takes.
Even if this crisis is not in all the medias, the contribution from the OSM contributors is fantastic. In 8.5 days, 302 contributors, 1.2 million objects, 114,000 buildings, 5,000 places and 6,100 landuse polygons.
The New Scientist article explains how OpenStreetMap helps fight the virus:
Mathieu Soupart, who leads technical support for MSF operations, says his organisation started using the maps right away to pinpoint where infected people were coming from and work out how the virus, which had killed 95 people in Guinea when New Scientist went to press, is spreading. "Having very detailed maps with most of the buildings is very important, especially when working door to door, house by house," he says. The maps also let MSF chase down rumours of infection in surrounding hamlets, allowing them to find their way through unfamiliar terrain.
Since the response to the Haiti earthquake we are now seeing time and again how OpenStreetMap is facilitating incredibly mapping of badly needed geo data, helping first line emergency responders do their work.
You can't do this with any other map but OpenStreetMap.
This type of massive mapping effort is only possible because of OpenStreetMap allowing direct editing of data to anyone and the availability of OpenStreetMap as raw and open data. The former allows anyone to get involved in helping respond to a crisis, the latter gives full power to responding parties over how exactly maps should look like or access to raw data for analysis. No other map offers this level of openness at a global scale.
Cross posted to talk list
Effective immediately the Mapbox Satellite option in iD and JOSM is 100% open for tracing in OpenStreetMap, including all our high resolution DigitalGlobe imagery. This is full coverage down to zoom level 19 imagery in the US + Western Europe and world wide to zoom level 17.
To use this imagery select "Mapbox Satellite" from the imagery menu in iD on the web or in JOSM. Mapbox Satellite is open for tracing in OpenStreetMap in general and not tied to a specific editor, so if you would like to add Mapbox Satellite to another OpenStreetMap editor you are welcome to do so.
This is a big affirmation of DigitalGlobe's commitment to provide imagery for OpenStreetMap (also Bing imagery contains to a very large degree DigitalGlobe material). Props to Kevin Bullock and our friends at DigitalGlobe - it's fantastic working with good people who see wins of working with OpenStreetMap.
Editing in Washington DC with the Mapbox Satellite layer
PS - on an existing installation of JOSM you'll have to refresh your imagery menu like so: http://cl.ly/image/383O2L0t431s
OpenStreetMap is published under a share-alike license, the so called Open Database License (ODbL). The license says that if raw OpenStreetMap data is mingled with raw third party data, and the result is used publicly, you are required to release the result under the same ODbL. This is, in short, the share-alike principle under which OpenStreetMap data is available today - under certain circumstances, it extends the license of OpenStreetMap data to data sets it's mixed into.
Sounds like a great idea at first, right? You're promoting the idea of opening data by making sure anyone who uses your data opens their data too. Well, there's a big gotcha: we wind up more often with OpenStreetMap not being used rather than with previously closed data opened up. This in turn hurts the project which thrives on increased adoption.
Photo: Alan Levine
Organizations or individuals who want to mix OpenStreetMap data with third party data often can't because they aren't in a position to make licensing decisions on that third party data. The reality is that opening data under a specific license is usually too slow or plain not possible.
Often times confusion about what's allowed and what is not allowed under the ODbL is just as bad. Ever seen advice opening with "I'm not a lawyer, but..."? That's what I'm talking about. Ever tried to get an actual lawyer to provide guidance on the ODbL? That's what I'm talking about. Tried to use the OpenStreetMap Wiki to learn about how the ODbL is interpreted by the licensor, the OpenStreetMap Foundation? That's what I'm talking about.
The result is that OpenStreetMap is not being used in situations where it should be used, which undermines a project whose success depends on increased adoption.
Not only is OpenStreetMap not being used as much as it could, the assumption that share-alike encourages contribution is a myth. I have yet to meet the individual, company, non profit or government agency who contributes because that's what the license calls for. And I have yet to witness the troves of data opened under the ODbL in compliance with the license. OpenStreetMap gains no extra benefit from share-alike. The reality is that OpenStreetMap is only used extensively in situations where the share-alike license does not apply, for instance, map rendering.
Here are examples of what should be possible with OpenStreetMap but is not because of share alike:
The Wheelmap community manages wheelchair accessibility information for over 400,000 thousand places in OpenStreetMap. Ideally Wheelmap would be able to syndicate this data into any other map - think Nokia, Google, Apple. Today they can't because of share-alike limitations of the ODbL. Woulnd't people using this data on Google maps mean more people with an interest to maintain and improve it on OpenStreetMap since they would know that adding data to OpenStreetMap means adding it to all the maps in the world?
Currently, New York City building and address data is being imported into OpenStreetMap (disclaimer: I'm involved). Ideally the government of New York City would just copy changes from OpenStreetMap to help maintain their own datasets - but they can't. Many datasets managed by government behind closed doors today should just be managed by the same maintainers on OpenStreetMap tomorrow - with gains for everyone. Think of the US Census Bureau whose TIGER data we're all benefiting from. This vision of citizens and government collaborating around OpenStreetMap is severely cut short by the ODbL. Governments will never use OpenStreetMap in an extensive way until they can make it part of their workflow, and as long as the ODbL taints any data that touches it, it can't. Look at the United States - many government datasets are public domain, government can't use OpenStreetMap directly because the ODbL is not compatible with it.
And what about exchanging data with our big sister project Wikipedia? We should be copying a lot more data back and forth between OpenStreetMap and Wikipedia. OpenStreetMap could be Wikipedia's geocoder and gazetteer. And yes, if it wasn't for Wikipedia's own share-alike license, we could mine Wikipedia for addresses, phone numbers, home pages, and populations without a bad conscience. Wikipedia can't use OpenStreetMap because OpenStreetMap is not truly open, and OpenStreetMap can't use Wikipedia becuase it is not truly open. What better examples of two sucessful open data projects are there than Wikipedia and OpenStreetMap - but we are not open enough for our data to touch? This makes no sense.
If we dropped share-alike, nothing would stop players like Google or Apple from mixing OpenStreetMap data extensively into their mobile maps. And this is a good thing. OpenStreetMap's opportunity is not to compete and win against the Google Maps of the world, but to say what's on their maps. With adoption on established mapping platforms OpenStreetMap would instantly reach many millions of users with its data, drastically increasing the project's impact and playing a bigger role than stale backfill. OpenStreetMap's current licensing is stunting our growth - and diminishing the impact of all of the amazing data that we have.
Under the current license, these example cases are either outright impossible, or require time, good lawyers and programmers to avoid share-alike to infect third party data with the ODbL. The ODbL imposes unnecessarily onerous hurdles at no gain for the project. Worst of all, just the license's ambiguities kill adoption.
If OpenStreetMap is to turn into the data set that makes geo data a true public good we have to drop share-alike. Let's make OpenStreetMap data actually open.
OpenStreetMap is at the verge of being the dataset that powers the world, quite literally. What's between where we are today and making OpenStreetMap the source for global geographic data, is that OpenStreetMap simply can't be used in many applications where it would be the ideal solution. These lost opportunities matter because they are what keeps OpenStreetMap from having the impact it should have. As Serge Wroclawski succinctly argued in his essay on why the world needs OpenStreetMap, OpenStreetMap's purpose is to democratize who decides what's on the map:
Every time I tell someone about OpenStreetMap, they inevitably ask "Why not use Google Maps?" From a practical standpoint, it's a reasonable question, but ultimately this is not just a matter of practicality, but of what kind of society we want to live in.
OpenStreetMap simply won't matter if it doesn't power the applications that millions of individuals use to search, navigate and contextualize each day. The more OpenStreetMap is used, the more impactful each of our work is, and the more incentives we create to join the movement. We should not be afraid of that.
For your reading pleasure: Here's are the entire 4,000 words of a license we should be throwing out: ODbL 1.0. I will be speaking about this topic at the State of the Map US conference in Washington DC. Join the conversation here or on Twitter.
Share what you've been working on, or present your vision for OpenStreetMap at this year's State of the Map US in Washington DC April 12 - 13.
You have until February 2nd (this Sunday) to submit your session.
You'll find the submission form here: http://stateofthemap.us/
Looking forward to hearing from you.
Presentations at State of the Map US 2013. Photo: Justin Miller.
This is an update on the ongoing import of New York City buildings and addresses. For background read up on New York City and OpenStreetMap cooperating through Open Data
We have taken the time to take a close review of existing uploads. Here are some issues we've found that are worth highlighting as we restart the import.
- Make sure every upload to OpenStreetMap completely validates and all critical warnings are resolved before you update.
- Critical warnings are at least any warnings or errors that stem from
- Buildings overlapping with buildings
- Buildings overlapping with other features they cannot overlap with such as roads
- Resolve not only buildings duplicate with existing buildings but also addresses duplicate with existing addresses
- Merge point of interest information from existing nodes to new buildings when they clearly building-level such as schools, fire houses, super markets, etc.
To get started head over to the tasking manager carefully (re) read instructions and grab a task.
Make your life easier and get these JOSM styles for buildings and addresses by emacsen. They'll allow you to see issues with the data better. Learn how to install them in JOSM docs.
If you have any questions, fire away here on the comment thread.
On the imports list I recently raised the question on whether to tag addresses on buildings ways or not. Specifically, if there is only one address for a given building polygon, should the address tags sit on the building's ways or should the address tags sit on a separate node within the building? Obviously, if there is more than one address per building, there is no other way but mapping them as nodes separate from the building way.
Eric Fischer just ran an analysis to figure out what is actually the current convention in OpenStreetMap. Here's the short answer: addresses are tagged on building ways where possible. By a wide margin.
Read on for the numbers.
Address tagged on building ways (left) is the more common approach in OpenStreetMap versus address tagged on a separate node (right).
The rough numbers break down like this:
- 10 million buildings carry addresses on the way.
- 3 million buildings contain one or more address nodes.
- 4 million address nodes sit within a building.
So the maximum theoretical number of buildings with a single address node is 3 million minus 1. Contrast this with 10 million buildings with the address information on the way. This still assumes one crazy building containing one million address nodes and it does not discount redundantly tagged addresses in the case of POI nodes that duplicate the address of the building they sit in.
Here are the full numbers (OSM planet, September 25 2013):
Buildings 91,917,857 Buildings with address on way 9,386,811 Buildings that contain one or more address nodes 2,960,363 Address nodes within a building 3,858,096 Address nodes that are not on or within a building 10,135,036 Addresses on a node of the building way 673,975
(An address node is defined here as a node that contains an
addr:housenumber tag and is not part of a building way.)
Good or bad?
For now, I'll personally stick to this convention as it's established. For the same reason I also want to stick to it for the ongoing New York City building and address import.
In principle though, I question tagging addresses on building polygons. It's a special case with no benefits while separate address nodes would work in both, the case where there are multiple addresses per building polygon and where there is only one address per building polygon.
Last Saturday we officially kicked off the NYC building and address import with a community session hosted by OSM-NYC and Public Labs at the Pfizer building in Brooklyn. The goal was to get the local NYC OSM community involved in this large data undertaking and at the same time harden our import process.
Over 20 people attended, and we knocked out 158 of the over 5000+ sub-tasks total. Both turn out and tasks accomplished were great and exceeded what I expected for a casual Saturday afternoon event.
Working through this import we're learning very interesting lessons:
- OSM data structure is significantly different from traditional GIS, nailing down conceptual differences when translating to OSM takes time.
- Importing is a high inertia problem, partly due to sheer volume but also due to the lack of a solid tool chain like safe roll back tools or established conversion tools.
- Expect interesting quality issues in your source data. NYC data for instance has inconsistent address formatting in the source.
- Doing a fully automated import is non-trivial. For example, in NYC, buildings often intersect with misaligned TIGER roads. That's one big reason this import is not fully automated.
- Once all data is uploaded, we'll need a QA check on inconsistent data to catch any errors introduced by humans during the upload.
- This all feels a little like heart surgery.
Here are a couple of pictures and screenshots from the Saturday event. If you'd like to get involved drop me a line. Again, the import is on hold until a couple of issues are sorted out, but you're welcome to join.
Because I love it.
Open data is changing the world. OpenStreetMap, as a true open data project of the commons, is proving its viability with continuously growing contributor numbers and expanding adoption. We're well under way to replace what has been historically the realm of governments and proprietary-data companies with amazing open data that is not better because it's cheaper, but actually provides fresher and more detailed data because it's open and community driven. It's exciting to be a part of this in my job as data lead at MapBox, as an individual contributor and as a board member of the OpenStreetMap US chapter.
This last year on the board of OpenStreetMap US has been amazing. Working together with Jim, John, Randy and Martijn on the board and with the great support of community members like Kathleen Danielson, Bonnie Bogle and Ian Dees, we've brought the organization to a new level. We're honing in on our goal to not only promote OpenStreetMap in the United States, but to make it bigger, stronger and more diverse. Here are some of the things we've accomplished:
- Ran the biggest OpenStreetMap conference to date, bringing together almost 400 members of the US and international community in San Francisco.
- Ran quarterly editathons promoting OpenStreetMap on a local level. Twelve cities are participating in the October editathon, and more than 1,000 edits came out of the last one.
- Relaunched OpenStreetmap US to do a better job introducing OpenStreetMap to the world.
- Connected with government agencies to promote the use of OpenStreetMap and explore areas of cooperation.
For the next year I want to stay 100% focused on continuing to grow OpenStreetMap in the US and beyond. I am convinced we can do this only by uniting the many voices of our project and by being as open and inviting as possible to newcomers. This is why it will be so important to nail our conference again. The importance of State of the Map US for the growth of OpenStreetMap cannot be overstated. It is the main tool we have to convene our community, pull in new individuals and institutions and discuss the future of the project. OpenStreetMap brings together interests from very different backgrounds: it's being used and improved by individual mappers, businesses, governments and nonprofits. Individuals work with it as data consumers, data producers, software developers, designers and researchers. This diversity is exactly our strength and is exactly what we need to continue to grow OpenStreetMap.
I would love your vote for pushing on this vision on the board of OpenStreetMap US.
OpenStreetMap US elections are running from October 5th to October 12th, to vote, you simply need to be a member of OpenStreetMap US. Signing up only takes a minute, find out all details about the election over on the OpenStreetMap.us blog.
OpenStreetMap colored by contributor id
There has been lots of talk about groups on OpenStreetMap.org lately. In early 2013 Mikel called for better social tools, including groups on OpenStreetMap, and lately more often groups have been mentioned as a replacement for our ailing mailing lists. Saman had a version of groups in his blue sky mockups for OpenStreetMap.org. Tom's posted a sketch for groups as pull request.
I'd like to add a dose of skepticism in this discussion: I don't think we should implement groups on OpenStreetMap.org right now, there are better alternatives to get started with if our goal is to make OpenStreetMap more social and let mappers connect better.
- Most conversations ideally don't require groups.
- It's hard to do social software right, groups in particular.
- Social media platforms are distributing.
(1) Most conversations ideally don't require groups
When you stop to think about it, groups are a crutch. They require you to set up a space with a topic and name (even if it's just a couple of clicks), then people need to find it, subscribe to it and sustain interest in the group. If the group doesn't go well, it bleeds members and lives on as a distracting zombie. Ideally, you'd be able to have conversaions ad-hoc around a certain topic or locality. That's one reason why you don't find groups at all or in a dominant role on some of the most successful social networks today.
(2) It's hard to do social software right, groups in particular
What was the last forum or groups software you used that didn't suck? Right. It's hard to do groups right on an interaction design level. I personally haven't seen general group discussion software ever done right, but what I do know is this: whatever we embark on means significant investment - or falling short on expectations. The risk to wind up with another level of noise in our already brittle social space is real.
(3) Social media platforms are distributing
Today OpenStreetMap enthusiasts gather in spaces on mailing lists, Meetup.com, Twitter, Facebook, forums, and Google Groups. Whatever we build competes in this space. Right now, we shouldn't attempt to build the better replacement for all of this, but think of OpenStreetMap.org as a compatible layer, allowing mappers to bring OpenStreetMap into their respective social online environments with ease.
Instead of introducing groups as a large new feature on OpenStreetMap.org I suggest we fix current social functionality on OpenStreetMap.org. This would vastly improve how mappers connect on a local and global level and would allow us to take an iterative approach, giving us real returns at each step, building on firm, well known ground. Here's a first back log:
- Great opt-out email notifications for edits, diary posts, comments of who you're following and posts you've commented on.
- Make it much easier to see who's mapping in an area
- Introduce public wall-style messaging, allowing conversations in the open.
- Ideally shut down private messaging to avoid abuse (which is happening according to administrators).
- This is small: Rename 'friend' to 'follow' - because that's what it is, no one confirms a friend request on OpenStreetMap.
- Kill the home location feature including the map on the profile
- Replace the useless friend listing and 'in your area' listing on your profile with a list of latest edits by who you're following
- Encourage users to link to local groups from their profile (Facebook links, meetup.com links, mailing lists links, wiki links, etc.)
- Possibly: vote up (down?) comments on diary.
Each of the above steps is small compared to implementing groups - still, each one will require dedicated work. Together they are designed to move us forward in a solid fashion from where we are right now. And note: some of the features like notifications could come in handy if we still wanted to introduce groups at some later point.
So what about our mailing lists?
Done right, the above improvements will already take important weight off of our mailing lists, we should iterate from there. With improved notifications and commenting on diaries we'll have much better spaces for meaningful discussions. I assume that much of our outcome oriented work will continue to move to GitHub. It's also going to be interesting to watch how Map Club will move in this realm. In addition, I suggest evaluating Discourse.org, a promising new discussion forum by Jeff Atwood the maker of Stack exchange.
What do you think?
Disclaimer: I am offering this simply as food of thought for those who're interested in pushing on social features right now. From a MapBox team perspective, we're not queueing up any immediate work on the social features mentioned here.
Finding out fast who's modified the map is hugely valuable to review changes in areas you care about, to connect to new mappers or to just show how fresh the map is.
Unfortunately, the part of OpenStreetMap.org that's supposed to provide this functionality - the history tab - is functionally broken. I'd like to suggest here a straightforward way to fix it, punting on some of the hard engineering problems that fast browsing of historic changes bring with them.
To recap if you're new to the issue, here's why the history tab doesn't work today: virtually anywhere in the world you'd like to see the latest changes of a particular area on OpenStreetMap, what you'll actually get is large-scale changes whose bounding box happens to intersect with with the area on the map you're viewing while not actually impacting any data in the area you're viewing.
The underlying engineering problem is Hard: changes to OpenStreetMap are organized in changesets, each one of which can contain up to 50,000 edits and whose modifications can be geographically huge. Querying all changesets that actually modified data in an arbitrary bounding box of the world and displaying them in reverse chronological order is computationally expensive while at the same time it should happen in milliseconds to satisfy a web request and allow for fast browsing.
Fixing the history tab
At the Chicago hack weekend Tom and I created a prototype that completely punts on the expensive problem of fast browsing for the entire changeset history. The approach we've taken is essentially to present you with a map and a list of the latest changes on visible elements first, then only reveal the history of an element when you click on it.
Conceptually, this is very straightforward and supported by existing APIs, computationally this is dirt cheap. This is not actually a novel approach in OpenStreetMap (most editors do something like that) but it is viable as an alternative to today's history tab.
The prototype doesn't do any data processing itself and is actually just a simple HTTP and JS application hosted on Github pages. It uses the OpenStreetMap API's map call. The latter means it is querying OpenStreetMap in an unefficient way, but this is a comparatively simple problem to fix. It could just as well query a very simple tiled data source.
Check out the result for yourself. I think it is already a very useful browser for exploring changes on OpenStreetMap. With few iterations this could be much faster and a viable fix to OpenStreetMap's history tab.
- Prototype: http://osmlab.github.io/latest-changes
- Codebase: https://github.com/osmlab/latest-changes
Related conversation on [dev] list can be found here: http://lists.openstreetmap.org/pipermail/dev/2013-May/026891.html
We just relaunched LearnOSM - the step by step resource for learning OpenStreetMap. LearnOSM was launched in 2011 by the Humanitarian OpenStreetMap team for the workshops they are giving world wide. It has grown into a resource used by OpenStreetMap newcomers and trainers well beyond its roots in humanitarian aid and disaster risk management.
Read up on the new LearnOSM and learn how to contribute to translations and other improvements by Jue Yang, the designer and developer behind the new face of LearnOSM:
Andy Allan recently ported the OpenStreetMap standard style from pure Mapnik XML to CartoCSS and TileMill. This is exciting as it's a huge step towards making contributing to the style more accessible. The port is nearly perfect and kinks are being worked out right now. I took a minute to spot check and pull together a couple of screenshots showing just how close this awesome piece of work is. I hope to see this go up soon on OpenStreetMap.org. Andy's port wouldn't change anything about how tiles are being rendered on OpenStreetMap, all that changes is how the style for those styles would be created: In the future, we'd use TileMill and generate the Mapnik XML from user friendly Carto CSS.
Here is San Francisco in the new OSM-carto port, just looks like the existing map:
For spot checking I use the comparison app that Ian and Tom cobbled together using OpenStreetMap US servers and bl.ocks.org. It shows the existing OSM Standard style to the left (I'll just call this style 'OSM' for the remaining post) and the new port to Carto CSS on the right (I'll call it 'OSM-carto' for the remaining post). Note: right now it's slow / down as performance problems are being figured out.
What follows here is a quick log from my review, others have been busy spot checking, too, head over to the issue queue to find out more.
School labels are different
School labels aren't bold in the new OSM-carto style. This has been fixed in the meantime.
Complex junctions seem to be rendering great with roads correctly rendered on top of each other and labels placed well.
OSM-carto has halo on secondary
OSM has no halo on secondary, while OSM-carto does have one in orange (see Castro street label) #25.
OSM-carto has no halo on tertiary
To the contrary, OSM has a white halo on tertiary highways where OSM-carto has none #24.
Label placement seems to be slightly different
This might be due to slightly stale data and generally doesn't seem to make a difference in terms of cartographic quality. In this difference rendition between the OSM and the OSM-carto style you can see how all labels seem to be offset by a certain factor and some street labels are placed at different positions along the way.
Low zoom levels
Low zoom levels look almost perfect at quick inspection. I do not know where differences in labels (see Sapporo for instance) come from.
Mid zoom levels are almost 100% the same
Differences in landcover order
There are some known differences in the order of land cover. This is being worked out right now in #15. Here is an example where the difference in landcover order surfaces in the visual result.
Stoked about test driving the great work of Tom, Richard, John, Saman and others on the new iD editor. Here's a quick screenshot. You can get started yourself with iD fast, just clone the repo, point your browser to the index.html file in it and get started editing. Note that if you'd like to upload your changes to the test server, you'll need an account on it. Report any bugs you're finding on the issue queue.
We had good turnout for the OpenStreetMap and MapBox workshops in preparation for Desarrollando America Latina.
The videos of the webcasts are up now, all in Spanish:
We're on the bus now heading back from New York City to Washington DC after an eventful week (read about the $575k Knight awarded Development Seed for OpenStreetMap work on the MapBox blog).
Here are some pictures of the OpenStreetMap intro workshop Ian and I held yesterday at foursquare's headquarter's. Thanks to the NYC OSM community leads Liz, Serge and Eric for the initiative and promotion.
Thanks to David Blackman of foursquare for kindly offering foursquare's great digs in Manhattan for this workshop. Join the OSM NYC Meetup Group if you'd like to stay in the loop on all things mapping NYC.
A great crowd of about 30 people turned out for the workshop.
Kicking off with examples of the many ways of how OpenStreetMap data can be used.
We quickly went hands-on and started editing and improving data on OpenStreetMap using JOSM.
Ian showing how to build quick OpenStreetMap based maps using Migurski's shapefile extracts and TileMill
Liz Barry explains how DIY aerial imagery can be used in OpenStreetMap
And yeah, the view from foursquare's office is killer