jremillard's Diary

AI With Satellite Images for OpenStreetMap

Posted by jremillard on 10 February 2018 in English.

The AI/neural network/deep learning/machine learning wave is ready to touch OpenStreetMap. Modern machine learning algorithms require a lot of data, and we have lots of data. OSM is going to be a natural place that open source, machine learning algorithms development happens.

Exciting and useful machine learning projects are possible today, using only the OSM database. For example, statistical based editor presents, changeset anomaly detection, import changeset detection, and smart auto tag/value suggestions are possible with just the OSM database.

To open things up, we need additional data sources like overhead satellite images. However, satellites and airplanes are quite expensive. Microsoft/Bing and Mapbox purchase a license to use other companies images, such as DigitalGlobe. They are restricted by the upstream license on what they can offer to the OSM community. For example, when Facebook wanted to use machine learning to map all the roads in Thailand they had to purchase a license for the images. The license did allow them to share the images with anybody. The DeepOSM project handled the image issue by using public NAIP images. However, the NAIP images are not ideal for OSM, the resolution is only 1 meter, and coverage is just the US.

Over the past two months, both Microsoft/Bing and Mapbox have completed reviews of their satellite image licensing terms and determined that they are capable of offering their image layers for nonprofit machine learning projects whose goal is to improve OSM. The great news is that if they want to support Machine learning for OSM they can.

Microsoft/Bing has gone ahead and made it official with an email to the talk-us list.

Through one on one communications with Mapbox, they have asserted that their standard terms of service allow this use case as well. However, they are worried about the load on the servers, so for now, they would like to grant permission on a case by case basis. This is reasonable request given how data-intensive the algorithms are.

Today, there is no technical reason preventing the volunteer OSM community from utilizing machine learning to accelerate the project. Basically, anything that is visible in a satellite image is now going to be able to be identifiable via software at the same level of accuracy as an “armchair” mapper: baseball fields, tennis fields, basketball courts, soccer fields, football fields, bridges, solar panel farms, roads, driveways, parking lots, buildings, lakes, rivers, wetlands, rail roads, water tanks, gas stations, running tracks, vineyards, fields, forests, sand, jetties, lighthouses, airports, playgrounds, fences, wind turbines, pools, ski lift, road lanes, traffic lights, graveyard, power lines, etc.

Machine learning algorithms will obviously be used more often in future imports and automated edits. However, there are other high-value places that machine learning algorithms could be utilized by the project.

For example:

Maproulette tasks could be generated that highlight where older OSM data doesn’t match newer satellite images.
A changeset monitor could be written that compares real-time edits to satellite images and adds changeset comments for edits that look unusual.
OSM editors could suggest tags based satellite images.
OSM editors could suggest/snap geometries based on the satellite images.
OSM editors QA tools could integrate satellite images into the validation checks.
Satellite offsets could be determined automatically by using GPS traces.
Using previous DWG and community reverts and redactions, wildly bad changesets could be quickly noticed and reviewed by the community.
Overpass queries could include features extracted from satellite images.

It will take some time for all of this to get implemented, but I am sure that it will happen eventually. If you are a developer and this kind of thing interests you, the field is wide open!

Links

Discussion

Comment from imagico on 10 February 2018 at 21:14

Basically, anything that is visible in a satellite image is now going to be able to be identifiable via software at the same level of accuracy as an “armchair” mapper.

Nothing could be further from the truth.

At the same time this statement is fairly demeaning for OSM mappers who - in the majority - map with a high level of competence and consideration. Comparing this to the crude mechanically trained monkey style algorithms commonly advertised as ‘AI’ these days demonstrates either a severe misunderstanding of the work of mappers in OSM or a similarily wrong understanding of how these algorithms work.

Religious belief in AI as the solution to any and all problems of mapping will not make these algorithms work any better or make them more useful for mapping in OSM.

Comment from escada on 10 February 2018 at 21:30

Do you know about the problems of the current state of the AI applied by Facebook where many line-like features were recognized as road (https://forum.openstreetmap.org/viewtopic.php?pid=679826#p679826).

I think armchair mapping is still giving better result at this moment.

So probably in the future AI can be better than humans, but we are not there yet.

Comment from yvecai on 10 February 2018 at 21:35

I’m so happy to live in an area very well mapped where I can go for a hike under the trees to correct paths mistakenly mapped by a fellow human mapper. Where is the added value in computers mapping from aerial imagery??

Comment from Alan Bragg on 10 February 2018 at 23:14

Brilliant. I’d like to hear more at the SOTM US 2017. Couldn’t agree more.

Comment from MichaelK on 11 February 2018 at 05:06

This scares me: even as a experienced mapper it is sometimes difficult to trace aerial imagery correctly. Sorry but it is difficult to imagine that AI should be better. And looking at imports as something which could be consideres as similar approach: at a first glance it seemd to be a good idea. But it turned out to be bad for building the community (example: US). And a lot of imports had been made in a bad way which left a lot of bad data in our DB that has/had to be fixed by a lot of additional manual work (examples: US, Austria) => that’s why a import guideline (osm.wiki/Import/Guidelines) was invented. Something very similar should be invented and enforced for AI generated data.

Just my 2 cents.

Comment from Omnific on 11 February 2018 at 06:24

I will disagree vehemently with most of the comments. This is a great thing, and the best use case is building tracing, especially in the US. This is frankly never going to get completed by hand. We are building new buildings faster than buildings are being added in OSM.

A lot of European contributors are used to a single national GIS program that opens up all data, so this doesn’t matter to them so they will of course be against it. But in the US, every single county and city has its own GIS system, so that would mean contacting tens of thousands of governments individually, and hoping that they will allow open use of data. Frankly, I’d be surprised to get more than 20% of them. Maybe a country level allowance is required. That way, areas in France/Germany/etc that are already mostly mapped can be avoid the effects of AI tracing.

I also strongly disagree that imports hurt the community. This is especially the case in large cities with building footprints. Users are often discouraged from mapping POI and the really useful local knowledge when there aren’t any building outlines. In my experience, activity improves with more building outlines, because the work that has to be accomplished doesn’t feel overwhelming. Just take a look at some big cities in the US that are complete deserts in OSM.

I’m really excited to see what AI (in collaboration with real mappers for verification) can do.

Comment from GOwin on 11 February 2018 at 09:47

I’m excited to hear more about this, and how it can help improve AI for tracing imagery!

Eventually, I see AI being just as capable as human mappers with mapping certain features.

Comment from SimonPoole on 11 February 2018 at 12:15

@Omnific lots of the buildings in Europe have been traced by hand (essentially France and the Netherlands are the exceptions, not the rule), if not the majority, and in most European countries the GIS departments are organized on a state/municipality base just as in the states.

And new building development is a common theme all over the place.

Comment from amapanda ᚛ᚐᚋᚐᚅᚇᚐ᚜ 🏳️‍⚧️ on 11 February 2018 at 14:23

I’m not inherently opposed to using more computer vision, but it should be done well. But it’s bad to have big tech companies like Facebook not releasing their code or their data. Mapbox could use BitTorrent to distribute the satellite imagery to save their server load, but then they’d have to give up control.

Rather than everyone adding to the wiki map, there’s a trend for a priestly class of tech workers in Silicon Valley (i.e. white male Americans) who use their massive server farms to generate the AI results. It’s a great way to exclude everyone who can’t rustle up millions in VC money.

Comment from mikelmaron on 11 February 2018 at 15:22

To recap

“There’s some pretty cool things which maybe could be done with machine learning and mapping”

“Humans are better than machines and you’re offensive and I’m scared”

“Well maybe it’s still interesting, someday it will be better”

“Only if you are white and male and rich”

It’s hard to have a calm discussion about this topic anywhere – developments (real and imagined) in artificial intelligence tech are overwhelming human society.

The reality is for mapping – there’s lots of potential, a few early successes, but a long way to go. Yet it’s moving fast. Humans are not being removed from the mapping equation. Machine learning is a fancy way of describing a certain class of statistical models run by computers. They can be helpful aids for human mappers, to help us map faster and bring our attention to things we might have missed. We already use machine learning a lot in OSM – anyone ever map from mapillary detections? Improving these workflows and applying to other parts of the mapping process, while keeping our wits, can only help us make the map better.

Comment from MichaelK on 11 February 2018 at 20:57

@Omnific: O> A lot of European contributors are used to a single national GIS program that opens up all data

I don’t want to spill oil into the fire. But you are quite wrong with this statement, please do not use fake news but the truth. Looking at the situation in Germany: almost no data was donated from governments / GIS programs, almost all data has been “craft mapped” or manually traced from aerial imagery. And if you look at the quality of the map data in Germany (already a lot of years ago) and the German community you would notice that Germany has one of the best data coverages and a very good community. An example for this is Munich: we do not get house outlines from the local government, just a list to compare (but nothing to take over into the map => all buildings are hand traced by a very active community). It is even difficult to get aerial images from local sources Please investigate (e.g. in the OSM wiki) before you raise false arguments…

Just my 2 cents, Michael (more than 10 years with OSM in Germany)

PS: we have quite some examples where “rich companies” or projects just dump their data into OSM and the community has to take care about the left over or the damage that is caused (e.g. Pokemon Go) => we need to be careful..

Comment from Rovastar on 12 February 2018 at 04:17

It is an exciting potential for OSM with AI.

Again like any use of technology (imports, imagery, etc ) there is massive pushback from some parts of the OSM community. The term Luddites spring to mind.

So many negative comments and the post really was about using AI to assist mappers. True in the past there have been mistakes with it. And these types of AI learn with time and get better as they “learn”.

Like the anti-import brigade they fail to see any benefit. They bemoan things like the Tiger import in the US but without it there would be huge areas of the country without any map data at all. And without that then then other data consumers would, I suspect, not use OSM data or delay bringing it onboard. Things like Maps.me, Mapbox, PokemonGo, etc would not be as successful as they are for OSM. And then they create a snowball effect that and with that the massive increase in editors in the past couple of years. About double monthly users mostly down to maps.me and pokemon go.

I am firm believer in having more stuff mapped makes OSM more popular and then more people use it and that means more people map and the cycle continues. Having these anti-technology policies harms OSM progress.

Caution is sensible but we should help guide the use of technology in OSM not call for blanket banning of it (or make it as near as banning as possible). They don’t have to be perfect (and plenty of heavy mappers are far from perfect) just good enough. If they are 1% off in errors that should not be a case to dismiss/ban/stigmatize these uses of technology.

@MichaelK

I’m fine with spilling oil as I must call you out on your false arguments due to lack on investigation.The makers of PokemonGo didn’t dump their data into OSM. In fact they didn’t add anything. And has been been by far the biggest increase in the mappers over the past year or so to the OSM project because loads of new users (and not the company “dumping data”) added stuff. For example in South Korea when it was launched there there were about 100 times the normal daily mappers (about 12 daily before then up to 1250) solely because of this game.

Sure like any new users (and I believe there are many in the OSM community that do not seem to want new users at all) there will be some that do mistakes and vandalism. No need to shame them we should welcome the massive good things new blood has for the OSM.

Comment from jremillard on 12 February 2018 at 15:13

@roym, my post clearly explains why Mapbox and Microsoft can’t put the sat images into BitTorrent. Also, Facebook has released the AI part of the road detection code here https://github.com/facebookresearch/Detectron. Nothing is stopping you from getting involved.

On the Facebook road import, as someone who works on this kind of stuff, the code they wrote is really not that interesting. What we (OSM) should of asked for is the hand drawn training images. They are the most valuable part of what Facebook created. Looking at the training images and noting how many they had would of given the community a pretty good idea on how well the final data was going to eventually look.

Comment from amapanda ᚛ᚐᚋᚐᚅᚇᚐ᚜ 🏳️‍⚧️ on 17 February 2018 at 11:44

my post clearly explains why Mapbox and Microsoft can’t put the sat images into BitTorrent

The way it was written implied that Mapbox (unlike Microsoft) had purely technical problems (“server load”), so I suggested a technical solution. My apologies if I misread you.

Facebook has released the AI part of the road detection code

Cool! I hadn’t seen that. It’s only new (less than a month old). About a year ago Facebook said they weren’t releasing the code, which is why I thought that.

Comment from mindedie on 17 February 2018 at 12:14

My problem with ObjectRecognitionandDetectionSoftware (a.k.a AI) and mindless imports (import for sake of importing) fixing it take same or more time and effort then drawing from scratch. Defending (sad but all business defence for me sound like corporate shill talk) data dumps imports and mindless bot work and saying: map will be empty… kinda hurts. People like to fill empty (local) area and new mappers (some Pokemon Go) come to do that, not to fix imports. I donate my free time and work not for someone/thing dumping stuff on/around it (to dump - take a —-). Then so called AI or bots start helping not screwing around and messing data, which is publicly and openly available to USE, can we keep it gated somewhere? Please?

So many imports left to fix and so much left to map in the world. Cheers

Comment from Omnific on 17 February 2018 at 14:28

Michael & SimonPoole: Regardless of your views in Europe, it’s hard to say that it’s more difficult to get open data than the US. Spain and France have open castral data, Czech Republic mappers have Tracer 2 and castral maps. Go talk to the local government in Doddridge County, WV and see what kind of response you get to a request for open data.

This decision should be left up to the local mappers, not people with no skin in the game. The intent of OSM is to empower local mappers to map their surroundings, and if mappers in the US are more receptive to AI mapping than those in Europe, then we should be allowed to do so in the US.

Comment from Dalkeith on 5 March 2018 at 12:42

I really think this is brilliant the only way to get on top of the amount of data we are talking about.

Specifically could do great things for helping with moderation.

jremillard's Diary

AI With Satellite Images for OpenStreetMap

Discussion

Log in to leave a comment