Servus, Bayern! After releasing robosat v1.2 I trained it on 80cm aerial imagery to predict all buildings in Bavaria. tl;dr - here you can find the checkpoint ready to use.

Robosat is an open source end-to-end pipeline for feature extraction from aerial and satellite imagery seamlessly integrating with OpenStreetMap for automated dataset creation.

I downloaded the CC-BY-3.0 80cm aerial imagery from 2018 for Bavaria the Bayerische Vermessungsverwaltung provides. I then used the open source robosat v1.2 release to automatically create a training and validation dataset and trained it on my gpu rig. Within days we already get reasonable results without manual work involved.

For detailed instructions on how to create a dataset and run the robosat pipeline see my previous diary where I explain the process on drone imagery.

Here is a heatmap of all the z17 tiles in Bavaria where there are buildings


And here are two examples for the segmentation probabilities I get



In the second image there are buildings in OpenStreetMap but not in the predictions. This is due to the aerial imagery showing construction sites where there are now buildings.

I released the trained checkpoint here which allows you to

  • efficiently predict buildings on your laptop; we provide pre-build docker images for robosat v1.2 to make this a single command

  • use the checkpoint when training your own models on your gpus so that you only have to fine-tune the model and will reach solid results earlier and easier

As usual I’m happy to hear your feedback; hit me up be it in comments here, on Github tickets, or in the robosat channel in the osmus or thespatialcommunity slack.

Comment from RobJN on 8 June 2019 at 09:57

Is it a case of the more detailed the imagery the better the result? Have you tried it on any 12.5cm imagery?

Comment from daniel-j-h on 8 June 2019 at 11:55

Absolutely! For Bavaria there is only 80 cm aerial imagery (openly) available. You can check it out in iD by selecting it as a background, e.g. go to

then go to Background (or press “b”) -> select “Bavaria (80 cm)”

Now pan around and see if you can distinguish what’s a building, what’s a small building, a shed, a car port, a parking lot, large cars. It can be quite hard even for humans.

I used robosat on drone imagery (see this diary) and on the high-resolution Mapbox aerial imagery in North America (where you could see people walking around) back when I was working for them. In addition to the resolution there are multiple tricks to get higher quality predictions out of it robosat trading off training time / runtime or requiring manual dataset curation.

The use-cases I see for prediction even on the 80 cm aerial imagery are

  • change detection over the years to see how our cities evolve over time

  • finding unmapped areas or computing a score of how “complete” our map is

  • as a pre-filter / priorization stage in tools like osmcha; if the robosat model roughly agrees with a changeset adding a building then we can let it go through; otherwise flag it for human inspection

Comment from karkal6123 on 6 July 2019 at 20:53

Hello Daniel!

Many thanks for this post, your outcomes are really impressive!

As I am currently fascinated with trying the RoboSat out, tell me please, is it possible to take the released by you trained checkpoint [bavaria-dop80-release-1] and use it on my own UAV imagery? I have a ~2 cm orthophoto for a small area (roughly 30 buildings, however most of them are newly constructed) and no larger adjacent area is covered with orthophotos to train the model with…

So, the question is - can I use the pretrained model from Bavaria and use it on a small scale orthophoto located in other European country? To run predictions on my own ortho?

Huge thank you in advance for your kind help!

Best, Karoline

Comment from daniel-j-h on 7 July 2019 at 06:09

Do your images roughly look like the Bavaria images? Do buildings roughly look the same? Then I would just give it a try and see what you will get out.

There is also an option in the rs train tool to read in a checkpoint and fine-tune it to your specific dataset - maybe that’s an option if you can get your hands on more data (30 buildings sounds a bit low).

If you want to try it make sure to convert your dataset into Slippy Map tiles on zoom level 17 (maybe +-1 zoom level) since the Bavaria model was trained on z17.

There’s a robosat channel on the osmus Slack in case you run into issues :)

Comment from karkal6123 on 8 July 2019 at 19:52

Daniel, that is brilliant, thanks for support! It works, however, I used the bavaria-dop80-checkpoint.pth directly when calling the “rs predict” command to analyse my own dataset/ orthophoto. The quality is pretty low. Yet, it does work! I will experiment a little bit and also add some negative samples according to Maning’s tutorial.

Could you tell me please, does RoboSat use any other pre-trained model format? You released the .pth file. Can I use a pre-trained model with a different file extension?

I also wonder, and that could help other users too, I believe - do you have any rough estimation on how many buildings on aerial imagery there should be in order to train model in a solid, reliable way, giving great prediction results eventually? Thank you Daniel. No more questions.

Best wishes, cheers, Karoline

Comment from daniel-j-h on 9 July 2019 at 21:37

If you want to fine-tune pre-trained Robosat models then the weights saved in the .pth file have to exactly match the model architecture we use in Robosat.

Right now the model architecture is a encoder-decoder with skip connections. In addition we use a pre-trained Resnet50 for the encoder. Here are more details and paper references:

It is possible to

  • train this robosat model, save its weights in a .pth checkpoint file, and then load this file back when fine-tuning or for prediction

  • use a different pre-trained encoder e.g. use a Resnet18 or Resnet34 if you want a smaller and faster encoder. Then train the robosat model with this encoder and save the resulting weights into a .pth checkpoint file again

There are also some open pull requests where we experiment with different architectures:

these architecture changes are not compatible with old .pth checkpoint files which is one of the reasons we didn’t change the core architecture so far.

Regarding your second question about estimating the dataset size: it depends on your use-case: the zoom level you want to work with, the geographical area (lots of variety on the planet vs. a single city or country), how good your ground truth labels are and if you want to invest time to do manual dataset curation or refinement, and so on.

As a very rough guideline I recommend a couple thousand 512x512 px image tiles at least; then give it a try and see if it works. But as usual the more data the better.

Hope that helps,

Login to leave a comment