Last time, we talked about how we imported over 1 million buildings in LA. Watch this video from our SOTM-US talk. In this post, we'll talk about our ongoing cleanup.
No data is perfect, the quality of what we imported in OpenStreetMap was generally good, but in all things data, there will always be unexpected cases.
During the import trials, we discovered that the LA City data was split to the parcel boundaries resulting to small polygons that should be part of the larger building (see: #71). We fix this during the import by using the Auto-tools plugin in JOSM but there were cases when it wasn't fixed.
Detecting split buildings
We ran a detection for split buildings by analyzing size and shapes of buildings using OSM-QA-tiles, turf and tilereduce. A sample output looks like this:
The general idea is that:
- small buildings are more compact (low area, high shape value);
- large buildings are complex and wiggly (high area, low shape value).
The reverse is true for invalid/split buildings.
After several trials, we came up with an acceptable threshold for split buildings in LA.
Here are some examples of valid detection:
Fixing with Maproulette
The script detected ~4K buildings and is available as a task in Maproulette: http://maproulette.org/map/419/460642
Workflow for JOSM
- Go to http://maproulette.org/map/419/460642 and login with your OpenStreetMap account.
- Open JOSM and activate Remote Control tool.
- Download the Auto-tools plugin.
- Start fixing by merging the split building to the adjacent larger building using Auto-tools.
For ideas how to do this in iD, let me know.
We are continuously improving how we detect split buildings, if you have ideas, comment here or directly in the ticket.
Thank you for fixing!