The AI/neural network/deep learning/machine learning wave is ready to touch OpenStreetMap. Modern machine learning algorithms require a lot of data, and we have lots of data. OSM is going to be a natural place that open source, machine learning algorithms development happens.
Exciting and useful machine learning projects are possible today, using only the OSM database. For example, statistical based editor presents, changeset anomaly detection, import changeset detection, and smart auto tag/value suggestions are possible with just the OSM database.
To open things up, we need additional data sources like overhead satellite images. However, satellites and airplanes are quite expensive. Microsoft/Bing and Mapbox purchase a license to use other companies images, such as DigitalGlobe. They are restricted by the upstream license on what they can offer to the OSM community. For example, when Facebook wanted to use machine learning to map all the roads in Thailand they had to purchase a license for the images. The license did allow them to share the images with anybody. The DeepOSM project handled the image issue by using public NAIP images. However, the NAIP images are not ideal for OSM, the resolution is only 1 meter, and coverage is just the US.
Over the past two months, both Microsoft/Bing and Mapbox have completed reviews of their satellite image licensing terms and determined that they are capable of offering their image layers for nonprofit machine learning projects whose goal is to improve OSM. The great news is that if they want to support Machine learning for OSM they can.
Microsoft/Bing has gone ahead and made it official with an email to the talk-us list.
Through one on one communications with Mapbox, they have asserted that their standard terms of service allow this use case as well. However, they are worried about the load on the servers, so for now, they would like to grant permission on a case by case basis. This is reasonable request given how data-intensive the algorithms are.
Today, there is no technical reason preventing the volunteer OSM community from utilizing machine learning to accelerate the project. Basically, anything that is visible in a satellite image is now going to be able to be identifiable via software at the same level of accuracy as an “armchair” mapper: baseball fields, tennis fields, basketball courts, soccer fields, football fields, bridges, solar panel farms, roads, driveways, parking lots, buildings, lakes, rivers, wetlands, rail roads, water tanks, gas stations, running tracks, vineyards, fields, forests, sand, jetties, lighthouses, airports, playgrounds, fences, wind turbines, pools, ski lift, road lanes, traffic lights, graveyard, power lines, etc.
Machine learning algorithms will obviously be used more often in future imports and automated edits. However, there are other high-value places that machine learning algorithms could be utilized by the project.
For example:
- Maproulette tasks could be generated that highlight where older OSM data doesn’t match newer satellite images.
- A changeset monitor could be written that compares real-time edits to satellite images and adds changeset comments for edits that look unusual.
- OSM editors could suggest tags based satellite images.
- OSM editors could suggest/snap geometries based on the satellite images.
- OSM editors QA tools could integrate satellite images into the validation checks.
- Satellite offsets could be determined automatically by using GPS traces.
- Using previous DWG and community reverts and redactions, wildly bad changesets could be quickly noticed and reviewed by the community.
- Overpass queries could include features extracted from satellite images.
It will take some time for all of this to get implemented, but I am sure that it will happen eventually. If you are a developer and this kind of thing interests you, the field is wide open!
Links