Adding Microsoft Building Footprints To OSM With MapRoulette: Why And How
Posted by mvexel on 23 May 2022 in English. Last updated on 24 May 2022.This is a cross-post from my blog
Microsoft released a machine-generated dataset of building footprints for the United States some years ago. The footprints are derived from aerial imagery. This works well, most of the time. Where you run into problems is in rural areas, especially where there’s natural features and topography that throws the machine learning off. Then the machine starts to think all kinds of things are buildings:
This is one of the reasons why blindly importing this data into OSM is a bad idea. MapRoulette comes to the rescue, it can serve these footprints one at a time, and OSM mappers can work together to decide if a geometry does in fact represent a building and could be added to OSM. Here’s how:
Target Area
First we choose a target area. If you want to make the MapRoulette Challenge reasonably sized, I would suggest to do a county-sized area. For my demo purpose, I choose Wayne County in Utah. This is a great example, because it is a rural county with some small towns and lots of interesting topography to throw off machines. The examples above are all taken from Wayne County.
Download The Data
We need three datasets to prepare the buildings as a MapRoulette Challenge:
- The Microsoft Building Footprint data, which you can download as statewide files from Github. We downloaded the Utah file, which is a 306 MB GeoJSON.
- A county boundary file to select only those geometries from the Building Footprints file that are within Wayne County. We download a Utah county boundaries file from Census.
- The currently existing building footprints for Wayne County in OSM. We download these using an Overpass Query.
Pre-process in QGIS
This section assumes some familiarity with QGIS.
We load all three downloaded datasets into QGIS. This should look something like:
Wayne County is selected here. We then use the Extract By Location
processing function to extract the building footprints that are within Wayne County:
Lastly we discard those footprints that are overlapping any building footprint already in OSM:
We save this result as a GeoJSON file.
Pre-process in JOSM
Next, we need a little bit of processing in JOSM. We load the GeoJSON file from the previous step into JOSM:
We then use the Find function to select each way:
(You can’t just ‘Select All’ in JOSM because that would select both the ways and all individual nodes.)
With all buildings selected, we can simply add the building=yes
tag to all of them at once. We then save the layer as a .osm
file.
Creating the MapRoulette Challenge file
Using the .osm
file, we can use the mr-cli
tool to convert this into a MapRoulette Cooperative Challenge GeoJSON file. This is described in detail in the MapRoulette Documentation. In short, we use the command
mr coop change --out msbuildings_waynecounty_challenge.geojson msbuildings_waynecounty_notinosm.osm
The resulting GeoJSON file can be read by MapRoulette. MapRoulette will detect that this is a Cooperative Challenge GeoJSON and will create the Challenge accordingly.
Creating the Challenge in MapRoulette
The final step is to create the MapRoulette Challenge. This is an interactive process done on maproulette.org. You feed it with the created GeoJSON and instructions for the mapper, and you’re good to go! You can learn more about creating challenges from the MapRoulette documentation. There you will find a number of articles and screencasts on the topic.
Result
The Challenge created using the steps above is here. (This is an “undiscoverable” Challenge, meaning that it will not show up in the Challenge discovery on maproulette.org, but you can still get to it via a direct link.)
Discussion
Comment from kucai on 24 May 2022 at 02:15
just additional info, there’s a menu item in one of the JOSM’s plugin (can’t remember which) that allows you to de-select all the nodes in whatever you selected.
BTW, thank you for the great article. Definitely saving it for future reference.
Comment from mvexel on 24 May 2022 at 02:49
I didn’t know that! Learning something about JOSM every day… Glad you enjoyed the post.
Comment from ConnorWong on 24 May 2022 at 12:08
Hoi,
do you know any MapRoulette challenge with particularly high diligence observed?
Comment from Andrea Musuruane on 24 May 2022 at 12:16
Really nice post. BTW, the overpass query isn’t extracting buildings on multipolygon relations (even though there seems to be none in the above example).
Comment from b-jazz on 2 June 2022 at 16:58
I see a lot of building errors when going through and cleaning up things that OSM Inspector points out. I’m 95% positive these are from the Microsoft data, but I haven’t been able to conclusively prove it. I also haven’t been able to track anyone down that might be able to correct the area and update the shapefile dump so these errors don’t continue to show up.
Comment from PhillipCarew on 13 June 2022 at 05:44
Thanks for this, ran through the steps as in the post but saw a lot of “GeoJSON error: Polygons and MultiPolygons should follow the right-hand rule” errors when uploading to MapRoulette?
Comment from mvexel on 13 June 2022 at 20:15
PhilipCarew – You may need to use the QGIS “Force Right Hand Rule” tool or something similar to enforce the GeoJSON right hand rule in the data to be exported.
Comment from PhillipCarew on 14 June 2022 at 02:27
mvexel - Why would MR produce an unusable file when converting a .osm file to a geojson file? Is there something I missed? Frustratingly the native:forcerhr algorithm isn’t in my algorithm list in QGIS…so not sure how to fix that… This tool from Mapster also didn’t work: Mapster GeoJSON Right-Hand-Rule Rewinder. - It produced a json file.
Comment from mvexel on 14 June 2022 at 02:51
Hmm perhaps we can have a look together. If you’re on Slack send me a message there or right here on OSM is also great.