OpenStreetMap logo OpenStreetMap

Building detection at Villa Imelda, MacArthur.

For the last couple of weekends I’ve been tinkering with Robosat to detect features from aerial imagery. At its core RoboSat is using state of the art fully convolutional neural network architectures for semantic segmentation.

Daniel posted an excellent walk-through to run the RoboSat pipeline on openly available drone imagery in Tanzania.

This post follows Daniel’s guide for detecting buildings in drone imagery in the Philippines. The goal of this exercise is for me to understand the basics of the pipeline and find ways to use the tool in identifying remote settlements from high resolution imagery (i.e drones). I’m not aiming for pixel-perfect detection (i.e precise geometry of the building). My main question is whether it can help direct a human mapper focus on specific areas in the imagery to map in OpenStreetMap.

Daniel already outlined the step-by-step process. This post will focus primarily on the data preparation. To be able to build a robust model for feature extraction it is important to create a clean dataset as input to training your model.

The data

Dan Joseph from the American Red Cross collected a lot of drone imagery in Visayas, PH summer of last year. Thanks Dan and ARC! 🙇‍. I chose these images to start with my experiment because of the amazing resolution (up to 2 — 3 cm) and Dan and his team used this imagery to trace buildings which I can use for training the model.

The imagery is available in OpenAerialMap as individual TMS or GeoTIFFs — no composite seems to be available. In order for me to collect enough samples I needed to get images and masks from several of these GeoTIFFs or their corresponding TMS endpoint.

Data preparation

I downloaded several of the GeoTIFFs and clipped the OSM buildings from the bounding boxes of each imagery. These buildings will be used as training labels. But this is not enough, “presence” (positive samples) data will only give so much reliability, we also need to provide our model with “absence” (negative samples) data. Here’s an example of the data I traced using QGIS in addition to the buildings from OSM to bootstrap my negative samples.

imagery bounds > imagery > building > other cover types.
imagery bounds > imagery > building > other cover types.

Cleaning the “presence” data

As illustrated above, drone imagery usually don’t have a squarish boundary so using the bounding boxes to clip data from OSM is not optimal. You will get black tiles with buildings which can affect your model. In my initial data prep I got images like this in my training samples.

No imagery, but with buildings in OSM.
No imagery, but with buildings in OSM.

The are two options to clean the data, you can:

  • run rs cover, rs download and rs rasterize for all the images then delete these tiles or,
  • delete the buildings outside of the actual boundary of the imagery in QGS (you can do this too in JOSM, but make sure you don’t upload 😉 )

I chose the former approach because its faster. There may also be cases when the building polygon is not aligned to the imagery particularly when it was traced from another imagery source. Fortunately ,this is very minimal in my sampling areas and these smaller problems won’t be a problem during training as long as there is not a large constant bias in the dataset.

Adding “absence” data

For adding negative samples, I traced several polygons representing common landcover types such as roads, water/riverbeds, orchards, farmland (wet and dry) and bare areas. I added various landcover types so that the model can learn that these are not the features it should detect. In order to avoid overlapping with the building tiles I overlayed a z21 tile boundary in QGIS and only traced landcover type the does not overlap with a building poly.

Now that I have both positive and negative polygons, I need to run rs cover, rs download and rs rasterize to create a slippymap directory with images and rasterized buildings. Since I have to do this in all individual imagery, I created a script that loop through each imagery list from OpenAerialMap. Run the script for your positive and and negative samples snd combine samples in a single directory using rsync.

# Combine negative and postive samples to one slippymap directory

mkdir combine-images
rsync -vrh positive-images/21 combine-images/
rsync -vrh negative-images/21 combine-images/

# do the same for the rasterize output

In total I have 15,237 samples at zoom level 21. 7104 (46%) for presence of buildings and, 8,133 (54%) absence (negatives).

Training the model

Once data is ready, I can start training the model. I need to split the imagery and mask tiles into training, validation and a final evaluation dataset using rs subset. For my sample I split them into training 80%, validation 10%, and evaluation 10%.

Then run rs weight to calculate the class distribution of the masks in your dataset. Save the result of rs weights in your RoboSat dataset configuration file (dataset-building.toml).

Finally, start training the model. I used a GPU capable (aws p2x.large) machine to run the training.

./rs train --model config/model-unet.toml --dataset config/dataset-building.toml

In my training, I chose 4 as my batch size and 50 epochs. The checkpoints are saved after each epoch for evaluation. Once the training is done, the results are visualized on the saved plot files like below.

The main indicator to choosing which checkpoint to use for prediction is the validation mean IoU. In my trained model the highest mean validation IoU (~0.82) was at epoch 40.

Visualizing the result

Before running the trained model to another imagery, I visually inspected the results using my evaluation dataset. This is a good dataset to use since it was never “seen” by the model during the training process.

# Run prediction to the evaluation data
./rs predict --tile_size 256 --model config/model-unet-maning.toml --dataset config/dataset-building.toml --checkpoint tmp/pth/checkpoint-00040-of-00050.pth dataset/evaluation/images/ evaluation-segmentation/

# Get masks from segmentation probabilities
./rs masks evaluation-masks/ evaluation-segmenttation

# Compare images, segmentation and masks
./rs compare evaluation-compare/ dataset/evaluation/images/ evaluation-segmentation/ evaluation-masks/

Here are some results of the prediction (imagery-left, segmentation probability-middle, mask-right).

Good detection

Not so good

In most cases, the prediction looks pretty good except for landcover types that are very similar to building rooftops like paved highways, vehicle or when the structure is covered by trees. But overall I’m satisfied with the initial result.

Putting it all together

Now that I have trained model, I used it to predict detection in another imagery. Instead of downloading the tiles from OAM, I used Daniel’s tiler script to create tiles from a GeoTIFF.

python3 --zoom 20 660c5321-0334-471f-bca5-829d85fb1d40.tif 660c5321-images/

Delete completely blank tiles to prevent running the prediction on them (these are mostly found on the edges of the imagery boundary). This speeds up your prediction time.

find . -name "*.webp" -size -190c -delete #the filesize will depend on your imagery, make sure to delete only the blank tiles.

Run the prediction.

./rs predict --tile_size 256 --model config/model-unet-maning.toml --dataset config/dataset-building.toml --checkpoint tmp/pth/checkpoint-00040-of-00050.pth 660c5321-images 660c5321-segmentation

To visualize in a map, I created a leaflet side-by-side webmap to compare the predicted segmentation probabilities and the imagery. See the result here.

Of course, I had to load the data into an OSM editor! So I blended the imagery and segmentation together using imagemagick.

Give it a try! Go to OSM and copy the TMS endpoint below in ID’s custom background imagery.{z}/{x}/{y}.png


  • The process is straightforward, thanks to Daniel’s and Bhargav’s excellent README and diary posts. Part of this experiment is to confirm that anyone can do it without any knowledge of machine learning and I confirm it is possible. I have started reading some ML basics though to know more. This fastai lesson 0 video is a great introduction to the concept.
  • We can leverage open data and open source to build ML based detection models. All data (buildings from OSM, imagery from OpenAerilaMap) and tools (Robosat, QGIS) all free and openly accessible. This experiment won’t be possible without access to these data and tools.
  • It is critical that you have a clean data that you feed to the model prediction (GIGO). In my initial iterations the results were really bad because I did not include enough negative training samples. The outcome of your model highly depends on the training data you provide it.
  • A lot of data preparation can be done on a modest laptop. I only used a GPU capable machine for model training. What will take weeks to do the training on a CPU will take only hours with a GPU. In my case I deployed a p2.xlarge instance from AWS. This cost money, an ondemand p2.xlarge in in aws costs USD 0.9/hour. My current bill is around USD 50 after 4 iterations of model training.
  • I had fun! At first running a full machine learning stack seems daunting but RoboSat makes it so simple.

What’s next

The past few weekends was really fun, I plan to continue iterating on the training models and share back results (more visualizations in this repo). I had discussions with several friends doing drone capture and they think automated detection like this can help in the initial assessment of an area say after a crisis. Next time, I want to explore:

  • add more training samples into my current model using various drone imagery;
  • creating models for other feature types;
  • convert the result into vector (rs features) and feed it directly to the editor or a tasking tool.
  • publish my model for anyone to use, let me know in comments 👋 here if you want to test my models.

Give robosat a try! The devs are more than happy to help you along the way!

Location: Danao, Javier, 5th District, Leyte, Eastern Visayas, 6511, Philippines


Comment from Tomas Straupis on 22 July 2018 at 19:27

Thanks for sharing! Is there a difference in trained model prediction results from epochs 20-50? As train IoU look similar.

Comment from Dalkeith on 24 July 2018 at 07:32

Thank you for sharing very interesting

Comment from NewSource on 26 July 2018 at 10:52


To reproduce your experiment , can you please share the source 660c5321-0334-471f-bca5-829d85fb1d40.tif ? By the way what the way used to assemble several tiffs together ?

Comment from maning on 26 July 2018 at 11:26

@tomas straupis,

Is there a difference in trained model prediction results from epochs 20-50?

Good idea I have not tested the other high IoUs, I’ll report back here of the result.


can you please share the source 660c5321-0334-471f-bca5-829d85fb1d40.tif ?

Sure, here

by the way what the way used to assemble several tiffs together ?

I mentioned it here:

Since I have to do this in all individual imagery, I created a script that loop through each imagery list from OpenAerialMap.

Comment from NewSource on 26 July 2018 at 13:33

Thanks ! Did you use rs serve command ? If yes , I didn’t get exactly how to setup the (

Comment from tonytonyissaissa on 23 November 2018 at 12:39

Thanks for sharing!

I have a problem in producing an all-background mask for my negative samples. How can I obtain it ?

Best regards, Tony

Comment from kayoD on 16 May 2019 at 12:53

Thnks for the great article,,,can you please share your best checkpoint file for Philipines? I want to do some comparision

Comment from fevzidass on 10 June 2019 at 11:21

Thanks for your article, Maning. I have a question. In the below line what is the meaning of “190c”? I can not find any info about it. find . -name “*.webp” -size -190c -delete #the filesize will depend on your imagery, make sure to delete only the blank tiles.

Best regards.

Comment from LogicalViolinist on 16 July 2019 at 13:42

How did you prepare the subset with rs subset to set it up for 80/10/10?

Comment from maning on 16 July 2019 at 14:28

How did you prepare the subset with rs subset to set it up for 80/10/10?

I found something on the net that splits a text file by line based on %, so basically you input your cover csv file to this script:

Comment from iboates on 22 March 2021 at 13:32

When you say you added “several” absence polygons, about what order of magnitude did you mean? (i.e. 10s, 100s, 1000s?)

Log in to leave a comment