daniel-j-h's Diary Comments

Diary Comments added by daniel-j-h

Post When Comment
RoboSat ❤️ Tanzania over 3 years ago


2- Convert the OSM GeoJson file to a binary Geotif using the following code: 3- Convert the binary image to tiles using (the same as step 1)

The rs rasterize command rasterizes GeoJSON features into Slippy Map tiles.


Once you have the predicted tiles you can serve them via http and point e.g. a Mapbox GL JS map to it. There is an example for the compare maps and the rs serve tool here:

You can serve the predicted tiles as simple as python3 -m http.server 5000


You check out tiles where your predictions are not in sync with OpenStreetMap: where we predict e.g. a building but OpenStreetMap says there should be none. Because OpenStreetMap is not “complete” your model will

  • either predict a building where there is a building in the aerial imagery but not in OpenStreetMap: in this case your metrics will be lower since we count this as an error based on “ground truth” OpenStreetMap, or

  • your model predicts a building where there is no building in the aerial imagery or in OpenStreetMap. This can e.g. happen if you never train on images with swimming pools but now predict on images with swimming pools. In this case you can add these tiles into your dataset with a all-background mask nudging the model into the right direction and learning not to predict swimming pools as buildings.

Hope this helps; sorry for the delay it’s probably best if you join the robosat channel on the osmus Slack for quicker responses; there are quite a few folks there who are happy to help with questions.

Servus, Bayern: The robots are coming! over 3 years ago

If you want to fine-tune pre-trained Robosat models then the weights saved in the .pth file have to exactly match the model architecture we use in Robosat.

Right now the model architecture is a encoder-decoder with skip connections. In addition we use a pre-trained Resnet50 for the encoder. Here are more details and paper references:

It is possible to

  • train this robosat model, save its weights in a .pth checkpoint file, and then load this file back when fine-tuning or for prediction

  • use a different pre-trained encoder e.g. use a Resnet18 or Resnet34 if you want a smaller and faster encoder. Then train the robosat model with this encoder and save the resulting weights into a .pth checkpoint file again

There are also some open pull requests where we experiment with different architectures:

these architecture changes are not compatible with old .pth checkpoint files which is one of the reasons we didn’t change the core architecture so far.

Regarding your second question about estimating the dataset size: it depends on your use-case: the zoom level you want to work with, the geographical area (lots of variety on the planet vs. a single city or country), how good your ground truth labels are and if you want to invest time to do manual dataset curation or refinement, and so on.

As a very rough guideline I recommend a couple thousand 512x512 px image tiles at least; then give it a try and see if it works. But as usual the more data the better.

Hope that helps,

Servus, Bayern: The robots are coming! over 3 years ago

Do your images roughly look like the Bavaria images? Do buildings roughly look the same? Then I would just give it a try and see what you will get out.

There is also an option in the rs train tool to read in a checkpoint and fine-tune it to your specific dataset - maybe that’s an option if you can get your hands on more data (30 buildings sounds a bit low).

If you want to try it make sure to convert your dataset into Slippy Map tiles on zoom level 17 (maybe +-1 zoom level) since the Bavaria model was trained on z17.

There’s a robosat channel on the osmus Slack in case you run into issues :)

RoboSat ❤️ Tanzania over 3 years ago

The numbers in the tile files are slippy map tile x, y, z ids.

See the OSM wiki and the docs on rs cover:

The masks are generated based on the GeoJSON files you give it. It can happen that your rasterized masks and the aerial raster tiles you downloaded are not in sync, e.g. there could be more rasterized masks than you have downloaded raster imagery. In that case simply loop over both datasets and copy over the tiles for which you have both: a mask and an image.

RoboSat ❤️ Tanzania over 3 years ago

Robosat v1.2 comes with batched extraction and batched rasterization. Before we had to keep all the features and images in memory during extraction and rasterization which used up quite a lot of memory on larger datasets. We now support batched extraction and batched rasterization flushing batches to disk every now and then. The batches are somewhat arbitrary and not based on e.g. smaller areas.

Batched extraction

Batched rasterization

And check the v1.2 release notes

RoboSat ❤️ Tanzania over 3 years ago

FYI there are also more recent diary posts; from the last two weeks:

for folks following along this old Robosat on Tanzania drone imagery diary post.

RoboSat ❤️ Tanzania over 3 years ago

Hey the rs serve tool is mainly for debugging and quick development iteration cycles. You really should use the production ready rs predict tool for efficient batch prediction.

Then there are some limitations in rs serve

  • the zoom level is currently hard-coded here

  • it’s single threaded only and does not do batch prediction (inefficient at scale)

  • it does not handle tile borders like we do in rs predict so you might see artifacts at borders

  • even though the rs serve command takes host and port arguments, the map we serve to the browser right now assumes localhost:5000 for requesting tiles

You have to go in and adapt these manually right now. I’m also happy for pull requests and can help you along if you want to properly fix these issues and make rs serve more robust and user-friendly.

Hope that helps, Daniel

RoboSat v1.2.0 — state of the art losses, road extraction, batched extraction and rasterization over 3 years ago

Follow up post running robosat v1.2 on all of Bavaria’s 80 cm aerial imagery is here

Servus, Bayern: The robots are coming! over 3 years ago

Absolutely! For Bavaria there is only 80 cm aerial imagery (openly) available. You can check it out in iD by selecting it as a background, e.g. go to

then go to Background (or press “b”) -> select “Bavaria (80 cm)”

Now pan around and see if you can distinguish what’s a building, what’s a small building, a shed, a car port, a parking lot, large cars. It can be quite hard even for humans.

I used robosat on drone imagery (see this diary) and on the high-resolution Mapbox aerial imagery in North America (where you could see people walking around) back when I was working for them. In addition to the resolution there are multiple tricks to get higher quality predictions out of it robosat trading off training time / runtime or requiring manual dataset curation.

The use-cases I see for prediction even on the 80 cm aerial imagery are

  • change detection over the years to see how our cities evolve over time

  • finding unmapped areas or computing a score of how “complete” our map is

  • as a pre-filter / priorization stage in tools like osmcha; if the robosat model roughly agrees with a changeset adding a building then we can let it go through; otherwise flag it for human inspection

RoboSat v1.2.0 — state of the art losses, road extraction, batched extraction and rasterization over 3 years ago

I implemented polygonization for the parking use-case we had.

It’s implemented in the rs features tool and can be used after you got the model’s predictions (probabilities you can see above) and converted (potentially multiple - for ensembles) probabilities to masks with the rs masks tool.

Check out the robosat readme and the linked diary posts above - they go a bit more into detail how the pipeline works before and after the prediction stage.

Here is the robosat parking polygonization - building are more or less the same and in fact I’m using the parking handler as a building handler. It’s a bit ugly if you want to handle edge cases such as (potentially nested) (multi-)polygons but oh well.

The polygonization can definitely be improved by

Check out their work - it’s quite nice but they also run into edge cases and design trade-offs

Happy to guide you along if you want to work on this in robosat or have ideas.

RoboSat ❤️ Tanzania over 3 years ago

Hey I just published a new release v1.2.0 - read about it here. The official docker images work now again, too. Here are the docs.

For zoom levels there is an open pull request:

It should Just Work (tm) but I haven’t had the time to test it more thoroughly.

The problem there is we use some pixel-based thresholds and heuristics, and depending on your zoom level they will (slightly) change. The pull request implements these thresholds based on meters and no longer based on pixels. You can check out the code and help me test it by running it on your dataset, checking if results look reasonable, and playing around with the thresholds.

Ideally we’d have also a building handler (which right now would do the same as the parking lot handler). I just haven’t had the time to implement it properly and myself I can just quickly hack the code the way I need it.

Hope that helps.

RoboSat ❤️ Tanzania over 3 years ago

Segmentation faults are tricky to debug: could be anything from a bad installation to version mismatches to us not handling an edge case in your dataset.

As a first step I recommend using the pre-built Docker binaries. The official ones are currently not getting built automatically - we’re on it to fix it.

In the meantime I just set up automated Docker image builds for my fork which I keep in sync with upstream for the time being. You can run them via

docker run -it --rm -v $PWD:/data --ipc=host danieljh/robosat:latest-cpu
docker run -it --rm -v $PWD:/data --ipc=host danieljh/robosat:latest-gpu

Note for folks coming across this in the future: check the official mapbox/robosat docker images and use them if they are again up to date instead of danieljh/robosat.

RoboSat ❤️ Tanzania over 3 years ago

Multiple zoom levels work out of the box.

They get picked up automatically in the dataset loader if you put them all into the same directory. I would try e.g. with zoom level z and then z-1 and z+1 first. If you have a bigger difference in zoom levels your visual features in the images will be vastly different and it might make sense to build multiple models - one per zoom level - instead.

For the images and labels directory you will simply have multiple z sub-directories, as in:


Make sure the images and labels directory are in sync (for every image there is a label, and for every label there is an image) but otherwise that should be it.

I highly recommend training on GPUs. With CPUs you will have to wait a ridiculously long time. Also manually verify your images and labels correspond to each other.

RoboSat ❤️ Tanzania about 4 years ago

If possible provide more building tiles data; the building IoU is

for the foreground class (buildings in your case) only.

  • Are you using 256x256 tiles or 512x512 tiles?
  • Are you using the Lovasz loss in the model config?
  • How long do you train?
RoboSat ❤️ Tanzania about 4 years ago

Also what I’m seeing just now:

colors = ['denim', 'denim']

This doesn’t look right. You don’t want to give the background class and the foreground class the same color. Otherwise you will not be able to distinguish them visually.

RoboSat ❤️ Tanzania about 4 years ago

Yeap that looks pretty bad; you definitely need more negative samples. I’m wondering why you only get it for some tiles, though? Here’s a ticket for the all-background mask:

Hope this helps.

RoboSat ❤️ Tanzania about 4 years ago

Great! Keep me posted how it goes! :) Always happy to hear feedback.

RoboSat ❤️ Tanzania about 4 years ago

WebP or PNG does not matter. We can read all image formats supported by PIL

RoboSat ❤️ Tanzania about 4 years ago

In your dataset

  • every image needs a corresponding mask
  • every mask needs a corresponding image

That is for all z, x, y tiles you are interested in there have to be parallel images

  • dataset/training/images/z/x/y.png
  • dataset/training/labels/z/x/y.png

The same applies to the validation dataset.

Creating this dataset is on you and a bit out of scope here.

RoboSat ❤️ Tanzania over 4 years ago

Maybe visualize where you have GeoTIFF image tiles and where you have mask tiles. It could be that the GeoTIFFs just don’t cover all of the areas you extracted masks for.

Otherwise try to reproduce this with a small GeoTIFF and/or a smaller area.

And maybe try the gdal2tiles approach and see if the output is different.

You need to debug this a bit - could be multiple problems.