OpenStreetMap logo OpenStreetMap

Using Global Building Data

Posted by coolmule0 on 6 August 2022 in English. Last updated on 8 August 2022.

Microsoft have recently released their “Worldwide building footprints derived from satellite imagery”. This is a dataset containing the shape and positions of houses. It covers many parts of the world and is derived from Bing aerial imagery with the use of deep learning.

Data within JOSM

I find these footprints interesting for numerous reasons. The focus here will be to use the data to aid in mapping. There is further information on the OSM wiki.

In this diary entry I want to explain how I got the data into JOSM. In a following entry I will explain how I use this data to speed up building mapping.

Obtaining the data

The data is freely available on Github. It is licensed under ODbL and is compatible with OpenStreetMap. Each available country has its own zip file. Some of the files are of the order of gigabytes - far larger than can be opened in regular software. Pick a country and download the zip file.

Issues with the data

Care should be taken to examine and understand the data, and this set is no different.

  • I found that numerous buildings were repeated in the data. This meant that JOSM had duplicated buildings and duplicated nodes after importing.

  • There are various false-positives and true-negatives in the data. Some things that are buildings are not given in the data. Likewise the dataset thinks some objects are buildings which are not.

  • The rotation of buildings can be off.

  • The data may contain overlapping buildings, such as a garage overlapping with the house next to it.

Unzip and geoJSONL

The zip file contains a .geojsonl file (notice the ‘l’ at the end of the filename). This file type cannot be directly imported into JOSM. In addition, files on the order of gigabytes will clog up JOSM too much to be usable. (*Edit: see vorpalblade-kaart’s comment below for better information)

I found suggestions about splitting the large file into smaller files using the split command. However, the data is not spatially ordered within the file, so each split chunk has data from all over the country.

In order to create a small file of a subset of the data localised in an area of interest I turned to python and geopandas.

Python and Geopandas

Geopandas is a python library that can read and convert geoJSON(L) files, and can limit items to within a bounding box. I can specify the coordinates of a box and only the buildings within will be kept. This allows creation of a manageable file size focused on an area of interest. I’m sure what I used this for can be done with software like QGIS, but I’ve never used that type of software before.

A script

In the end I created a python script that would read in an unzipped .geojsonl file downloaded from the repository, cut out a selected region, remove duplicate geometries, and save the resulting geometries to a .geojson file that can be safely imported into JOSM.

The script is available on GitHub. You should have python installed. It needs the geopandas and numpy libraries, which can be obtained through the command pip install geopandas numpy. The script can be called with python extract_region.

Conclusion

The image below shows the data imported into JOSM

Import of dataset into JOSM

I’ve been able to extract geospatially grouped building footprints into JOSM. In testing how large files can be, and I’ve found files on the size of 300MB to be the limit that JOSM on my computer can handle. I have 24 GB ram.

My next entry will talk about how I use this data to speed up the mapping of buildings in OpenStreetMap and JOSM.

Discussion

Comment from MatthiasMatthias on 7 August 2022 at 06:45

Very nice!

Comment from DoubleA on 7 August 2022 at 15:13

Is there any discussion going on concerning automated imports and tracking the covered area?

Comment from CjMalone on 7 August 2022 at 15:16

Yeah very nice, I had missed the release of this dataset.

Just set it up to with with MapWithAI in JOSM locally. It’s got some great footprints, and some weird ones.

Comment from scruss on 7 August 2022 at 23:41

You must follow the import guidelines or your edits risk being removed.

You must use a dedicated import account for this, not your regular one.

Problem changesets include:

Comment from SimonPoole on 8 August 2022 at 05:49

Besides everything that has been said, there are regions, for example the UK, where if the outlines were generated from Bing imagery, the imagery is obviously no longer available making it very difficult to determine if the outlines are even just half correct.

The other issue is naturally, as has been pointed out many times, that the licensing isn’t ideal and will long term cause issues.

Comment from coolmule0 on 8 August 2022 at 11:14

@DoubleA. I’m not away of any ongoing discussion on this. The Wiki page had no mention until I made a small edit about it. I would very much like to know of any locations of discussion about this.

@scruss. A very good point. I wanted to expand in more detail about how I made the changesets in my next post, but would appreciate some feedback on it here beforehand if possible. I thought the way it was used was not an import, as I considered each footprint individually. In a similar way to having a map layer like cadastral parcels as an overlay, it was used as a tool to aid in mapping houses, rather than simply copy-pasting large data across. Would this be considered an import due to use of external data?

@SimonPoole. The database license is ODbL, which is the same as OSM, and hence should not have issues between them. Is there some further license issue?

Comment from SimonPoole on 8 August 2022 at 13:38

@coolmule0 while the ODbL is nominal compatible with itself, that is not the point. Incorporating ODbL licensed data makes a future licence change of any kind (and if it is just fixing the couple of minor issues the current version of the ODbL has), completely dependent on the good will of the original licensors (if they even still exist at that point in time), or removing the data in question.

There’s a further issue with all licenses that don’t allow sub-licensing that makes all such sources problematic, but that isn’t an ODbL specific issue.

Comment from vorpalblade-kaart on 8 August 2022 at 14:12

The zip file contains a .geojsonl file (notice the ‘l’ at the end of the filename). This file type cannot be directly imported into JOSM. In addition, files on the order of gigabytes will clog up JOSM too much to be usable.

That is roughly correct. JOSM reads delimited geojson files that follow the RFC 8142 proposed standard, which depends upon the RS (0x1e) record separator character. Assuming you have jq installed, you can run the following one-liner to convert the geojsonl file to something JOSM will open:

$ cat 'United Kingdom.geojsonl' | sed -e 's/^{/'$(printf "\x1e")'{/' | jq -c --seq . > 'United Kingdom.geojson'

You’ll still have to deal with the data slowing down JOSM (it tries to draw everything), but that can be fixed by zooming in once everything loads, assuming you have allocated enough memory to JOSM.

With a 6.0 GB file, you are going to want to allocate more memory to JOSM (see JOSM OutOfMemory), since the default memory given for JOSM is typically 4 GB or less.

Yes, we don’t have to keep the whole file in memory, but we do have to keep the data from the whole file in memory, since we aren’t just processing it and forgetting about it (which is where the line-delimited geojson format makes a difference).

Comment from scruss on 9 August 2022 at 00:38

Not just an import, a destructive one at that. You overwrote existing building that had metadata with bare outlines, e.g.: osm.org/way/559909735/history

I would very much like to know of any locations of discussion about this.

If you plan an import, it’s you that starts the discussion before you do it

Comment from coolmule0 on 9 August 2022 at 09:32

@scruss Thank you for pointing this out. This was a manual accident I made. I am still learning how to use JOSM efficiently, and sometimes my attempts at pressing shortcuts don’t do what I expect. Rather than delete the buildings I made, I accidentally deleted everyone else’s! I have endeavoured to fix the mistake I made.

Comment from skquinn on 10 August 2022 at 10:59

How does this differ from the existing data from Microsoft available via MapWithAI?

Comment from Cascafico on 14 August 2022 at 13:28

Please, find my considerations and tests at osm.org/user/Cascafico/diary/399590

Comment from Michi on 24 August 2022 at 21:48

Since geopandas used in the Python script is using huge amounts of memory, which I don’t have, I wrote a small program which does not load the file into RAM. Instead it processes the file line by line, feature by feature. This way the program is mostly I/O & CPU intensive. It also outputs directly in GeoJSON.

I uploaded the program to GitHub, if anybody else has not enough memory and wants to give it a try…

Log in to leave a comment