Building detection at Villa Imelda, MacArthur.
For the last couple of weekends I’ve been tinkering with Robosat to detect features from aerial imagery. At its core RoboSat is using state of the art fully convolutional neural network architectures for semantic segmentation.
Daniel posted an excellent walk-through to run the RoboSat pipeline on openly available drone imagery in Tanzania.
This post follows Daniel’s guide for detecting buildings in drone imagery in the Philippines. The goal of this exercise is for me to understand the basics of the pipeline and find ways to use the tool in identifying remote settlements from high resolution imagery (i.e drones). I’m not aiming for pixel-perfect detection (i.e precise geometry of the building). My main question is whether it can help direct a human mapper focus on specific areas in the imagery to map in OpenStreetMap.
Daniel already outlined the step-by-step process. This post will focus primarily on the data preparation. To be able to build a robust model for feature extraction it is important to create a clean dataset as input to training your model.
Dan Joseph from the American Red Cross collected a lot of drone imagery in Visayas, PH summer of last year. Thanks Dan and ARC! 🙇. I chose these images to start with my experiment because of the amazing resolution (up to 2 — 3 cm) and Dan and his team used this imagery to trace buildings which I can use for training the model.
The imagery is available in OpenAerialMap as individual TMS or GeoTIFFs — no composite seems to be available. In order for me to collect enough samples I needed to get images and masks from several of these GeoTIFFs or their corresponding TMS endpoint.
I downloaded several of the GeoTIFFs and clipped the OSM buildings from the bounding boxes of each imagery. These buildings will be used as training labels. But this is not enough, “presence” (positive samples) data will only give so much reliability, we also need to provide our model with “absence” (negative samples) data. Here’s an example of the data I traced using QGIS in addition to the buildings from OSM to bootstrap my negative samples.
imagery bounds > imagery > building > other cover types.
Cleaning the “presence” data
As illustrated above, drone imagery usually don’t have a squarish boundary so using the bounding boxes to clip data from OSM is not optimal. You will get black tiles with buildings which can affect your model. In my initial data prep I got images like this in my training samples.
No imagery, but with buildings in OSM.
The are two options to clean the data, you can:
rs rasterizefor all the images then delete these tiles or,
- delete the buildings outside of the actual boundary of the imagery in QGS (you can do this too in JOSM, but make sure you don’t upload 😉 )
I chose the former approach because its faster. There may also be cases when the building polygon is not aligned to the imagery particularly when it was traced from another imagery source. Fortunately ,this is very minimal in my sampling areas and these smaller problems won’t be a problem during training as long as there is not a large constant bias in the dataset.
Adding “absence” data
For adding negative samples, I traced several polygons representing common landcover types such as roads, water/riverbeds, orchards, farmland (wet and dry) and bare areas. I added various landcover types so that the model can learn that these are not the features it should detect. In order to avoid overlapping with the building tiles I overlayed a z21 tile boundary in QGIS and only traced landcover type the does not overlap with a building poly.
Now that I have both positive and negative polygons, I need to run
rs download and
rs rasterize to create a slippymap directory with images and rasterized buildings. Since I have to do this in all individual imagery, I created a script that loop through each imagery list from OpenAerialMap. Run the script for your positive and and negative samples snd combine samples in a single directory using
# Combine negative and postive samples to one slippymap directory mkdir combine-images rsync -vrh positive-images/21 combine-images/ rsync -vrh negative-images/21 combine-images/ # do the same for the rasterize output
In total I have 15,237 samples at zoom level 21. 7104 (46%) for presence of buildings and, 8,133 (54%) absence (negatives).
Training the model
Once data is ready, I can start training the model. I need to split the imagery and mask tiles into training, validation and a final evaluation dataset using
rs subset. For my sample I split them into training 80%, validation 10%, and evaluation 10%.
rs weight to calculate the class distribution of the masks in your dataset. Save the result of
rs weights in your RoboSat dataset configuration file (
Finally, start training the model. I used a GPU capable (aws p2x.large) machine to run the training.
./rs train --model config/model-unet.toml --dataset config/dataset-building.toml
In my training, I chose 4 as my batch size and 50 epochs. The checkpoints are saved after each epoch for evaluation. Once the training is done, the results are visualized on the saved plot files like below.
The main indicator to choosing which checkpoint to use for prediction is the validation mean IoU. In my trained model the highest mean validation IoU (~0.82) was at epoch 40.
Visualizing the result
Before running the trained model to another imagery, I visually inspected the results using my evaluation dataset. This is a good dataset to use since it was never “seen” by the model during the training process.
# Run prediction to the evaluation data ./rs predict --tile_size 256 --model config/model-unet-maning.toml --dataset config/dataset-building.toml --checkpoint tmp/pth/checkpoint-00040-of-00050.pth dataset/evaluation/images/ evaluation-segmentation/ # Get masks from segmentation probabilities ./rs masks evaluation-masks/ evaluation-segmenttation # Compare images, segmentation and masks ./rs compare evaluation-compare/ dataset/evaluation/images/ evaluation-segmentation/ evaluation-masks/
Here are some results of the prediction (imagery-left, segmentation probability-middle, mask-right).
Not so good
In most cases, the prediction looks pretty good except for landcover types that are very similar to building rooftops like paved highways, vehicle or when the structure is covered by trees. But overall I’m satisfied with the initial result.
Putting it all together
Now that I have trained model, I used it to predict detection in another imagery. Instead of downloading the tiles from OAM, I used Daniel’s tiler script to create tiles from a GeoTIFF.
python3 tiler.py --zoom 20 660c5321-0334-471f-bca5-829d85fb1d40.tif 660c5321-images/
Delete completely blank tiles to prevent running the prediction on them (these are mostly found on the edges of the imagery boundary). This speeds up your prediction time.
find . -name "*.webp" -size -190c -delete #the filesize will depend on your imagery, make sure to delete only the blank tiles.
Run the prediction.
./rs predict --tile_size 256 --model config/model-unet-maning.toml --dataset config/dataset-building.toml --checkpoint tmp/pth/checkpoint-00040-of-00050.pth 660c5321-images 660c5321-segmentation
To visualize in a map, I created a leaflet side-by-side webmap to compare the predicted segmentation probabilities and the imagery. See the result here.
Of course, I had to load the data into an OSM editor! So I blended the imagery and segmentation together using imagemagick.
Give it a try! Go to OSM and copy the TMS endpoint below in ID’s custom background imagery.
- The process is straightforward, thanks to Daniel’s and Bhargav’s excellent README and diary posts. Part of this experiment is to confirm that anyone can do it without any knowledge of machine learning and I confirm it is possible. I have started reading some ML basics though to know more. This fastai lesson 0 video is a great introduction to the concept.
- We can leverage open data and open source to build ML based detection models. All data (buildings from OSM, imagery from OpenAerilaMap) and tools (Robosat, QGIS) all free and openly accessible. This experiment won’t be possible without access to these data and tools.
- It is critical that you have a clean data that you feed to the model prediction (GIGO). In my initial iterations the results were really bad because I did not include enough negative training samples. The outcome of your model highly depends on the training data you provide it.
- A lot of data preparation can be done on a modest laptop. I only used a GPU capable machine for model training. What will take weeks to do the training on a CPU will take only hours with a GPU. In my case I deployed a p2.xlarge instance from AWS. This cost money, an ondemand p2.xlarge in in aws costs USD 0.9/hour. My current bill is around USD 50 after 4 iterations of model training.
- I had fun! At first running a full machine learning stack seems daunting but RoboSat makes it so simple.
The past few weekends was really fun, I plan to continue iterating on the training models and share back results (more visualizations in this repo). I had discussions with several friends doing drone capture and they think automated detection like this can help in the initial assessment of an area say after a crisis. Next time, I want to explore:
- add more training samples into my current model using various drone imagery;
- creating models for other feature types;
- convert the result into vector (
rs features) and feed it directly to the editor or a tasking tool.
- publish my model for anyone to use, let me know in comments 👋 here if you want to test my models.
Give robosat a try! The devs are more than happy to help you along the way!