RoboSat — robots at the edge of space!

Posted by daniel-j-h on 12 June 2018 in English (English)

At Mapbox we are happy to open source RoboSat our production ready end-to-end pipeline for feature extraction from aerial and satellite imagery. In the following I describe technical details, how it will change the way we make use of aerial and satellite imagery, and how OpenStreetMap can benefit from this project.

Berlin aerial imagery, segmentation mask, building outlines, simplified GeoJSON polygons

Live on-demand segmentation tile server for debugging purpose

Here is how RoboSat works.

The prediction core is a segmentation model — a fully convolutional neural net which we train on pairs of images and masks. The aerial imagery we download from our Mapbox Maps API in all its beauty. The masks we extract from OpenStreetMap geometries and rasterize them into image tiles. These geometries might sometimes be coarsely mapped but automatically extracting masks allows us to quickly bootstrap a dataset for training.

We then have two Slippy Map directory structures with images and corresponding masks. The Slippy Map directory structure helps us in preserving a tile's geo-reference which will allow us later on to go back from pixels to coordinates. It is RoboSat's main abstraction and most pipeline steps transform one Slippy Map directory into another Slippy Map directory.

We then train our segmentation models on (potentially multiple) GPUs and save their best checkpoints. We implemented our model architectures in PyTorch and are using GPUs, specifically AWS p2/p3 instances, and a NVIDIA GTX 1080 TI to keep our Berlin office warm during winter.

When we use the checkpoints for prediction on a Slippy Map directory with aerial imagery we get a Slippy Map directory with probabilities for every pixel in image tiles:

Parking lot prediction; probability scales saturation (S) in HSV colorspace

Which we turn into segmentation masks handling model ensembles and tile borders:

Smooth predictions across tile boundaries. Do you see tile boundaries here? No? Great!

Serializing the probabilities in quantized form and only storing binary model outputs allows us to save results in single-channel PNG files which we can attach continuous color palettes to for visualization. We do the same for masks and then make use of PNG compression to save disk space when scaling up this project e.g. across all of North America.

Based on the segmentation masks we then do post-processing to remove noise, fill in small holes, find contours, handle (potentially nested) (multi-)polygons, and simplify the shapes with Douglas-Peucker:

Segmentation masks, noise removal, restoring connectivity, finding contours

We then transform pixels in Slippy Map tiles into world coordinates: GeoJSON features. In addition we handle tile borders and de-duplicate against OpenStreetMap to filter out predictions which are already mapped.

The end result is a GeoJSON file with simplified (multi-)polygon recommendations. Thanks robots!

Here is an example visualizing the prediction pipeline:

Aerial imagery, segmentation probabilities, masks, extracted features, merging features across tile boundaries

I see RoboSat as a building block for multiple use-cases and projects:

  • RoboSat can "look" at every edit in OpenStreetMap in real-time to flag suspicious changesets. At the same time it can help to let good looking changesets go through without manual verification.
  • RoboSat can tell you how complete the map is in a specific area for a specific feature. For example: "Buildings in Seattle are 90% mapped". And then it can show you unmapped buildings and polygon recommendations for them.
  • RoboSat can be integrated into imagery platforms like OpenAerialMap or toolchains like OpenDroneMap to generate a better understanding of the area minutes after flying your drone.

And while the possibilities are endless I want to emphasize that RoboSat is neither meant for fully automated mapping nor capable of doing so. We will use RoboSat as a supporting tool but not for automated imports.

In the coming months we will invest into RoboSat to expand it to multiple features like buildings and roads (what we already have internally; see images at the top), better handle variations in geography, imagery quality and zoom levels — all while making sure the pipeline stays generic and scalable.

If you want to give RoboSat or related projects a go check out Pratik's note about using Mapbox imagery for machine learning.

Happy to hear your feedback; and feel free to open issues in the RoboSat repository for feature requests, ideas, questions, or bug reports :)

Comment from rorym on 12 June 2018 at 07:27

This looks very interesting. Is a GPU necessary? Can I run it on a regular laptop? I know it would be much slower. I'll have to try using this.

Comment from imagico on 12 June 2018 at 08:09

Glad to see you are following the first of my list of suggestions here - i hope you will also work on the other points.

I am also glad to read that you consider the main application of this not to be generating geometries for mapping (for which the examples you showed also quite clearly would not be the most suitable use cases). So i would scratch the "polygon recommendations for them" part of your scenarios. In particular for buildings this will likely fail miserably when done based on AI methods alone in many cases - in particular when you train not for exactly the same image (same viewing angle and same sun position) as you run it on. And in practical mapping fixing a bad geometry is often more time consuming than drawing a correct one from scratch.

I also have my doubts that post-mapping QA and remote sensing data assessment use cases as you described can much profit from this kind of method because you might end up with mostly evaluating your algorithm and its shortcomings rather than the data you want to evaluate. But this will remain to be seen.

What i can imagine to be a suitable application is some needle-in-a-haystack problems we have in mapping. Like: We have a city with 250 amenity=parking mapped - find the five ones that are missing and the five other ones that have significantly changed in size. This is a type of problem that will become increasingly important as OSM matures and reaches a high level of completeness in some aspects.

Comment from daniel-j-h on 12 June 2018 at 13:29

@Rory: here are some rough numbers: with a single GPU you can expect training on ~20k images to take in the order of multiple days. Prediction on a GPU will take between 350 ms -- 750 ms per 512x512 image tile - depending on how much overlap you buffer per tile to handle tile boundaries. Doing the same on CPUs and especially your laptop will probably slow things down by a factor between 10x -- 50x at least.

What you can do is prepare all the datasets and get ready for training. Then spawn up a AWS p2 instance (install nvidia drivers, cuda, cudnn) and train on there for a few days. Save the model, shut down the instance, predict on CPUs. You could also look into getting a GTX 1080 TI but it's only really worth it if you want to do these sort of things on a large scale for a couple of months at a time running 24/7.

Comment from Dalkeith on 12 June 2018 at 14:42

Excellent work I expect this to continue to improve at a steady pace.

Login to leave a comment