OpenStreetMap

When AI is (not) needed

Posted by TrickyFoxy on 17 March 2024 in English.

Looking at the OSM map on website of the university:

Looking at photo:


I look at my yard in MapComplete, and there are some stalls on the road:

I look out the window (well, so that you can see it too, on yandex.panoramas):

You open Rapid and calm down: This is what the AI recognized. Everything is fine with OSM.

But the mapmakers took a wrong turn. Why do you need AI buildings in cities that are mapped by cartographers?!

But the mapmakers are happy with that. This is a long time ago Google with auto-recognition showed.

Discussion

Comment from Minh Nguyen on 24 March 2024 at 17:40

I’m guessing this is the Microsoft building dataset, which applies computer vision to aerial imagery. Some data consumers like Mapbox and Overture Maps are using this dataset to backfill areas where OSM building coverage is lacking or nonexistent. From their perspective, the increase in coverage in places with fewer OSM mappers probably outweighs individual bloopers like this, and I guess from our perspective, we’d rather not face a bulk automated import of this dataset due to these bloopers.

Another thing that commonly occurs is that a building has been demolished, so we’ve deleted the building from OSM. But a data consumer working off outdated aerial imagery can’t distinguish that from a never-before-mapped building, so it restores the building from the Microsoft dataset. Of course, a human mapper could make the same mistake if they happen to be using the same outdated imagery with no local knowledge.

To address both cases, I’ve gotten into the habit of retagging buildings as demolished:building=*, at least until the local default imagery layer gets updated. These data consumers will omit any Microsoft building that intersects a building one OSM, so I hope they’ll do something similar with demolished:building in the future. This key also has the benefit of serving as a to-do list for OpenHistoricalMap and as a pre-cleanup step for any building import planned for the area.

In theory, we could go around mapping no:building=yes for thr buildings on wheels you spotted, but my hands are full already without worrying about something that Microsoft could fix by tuning their noise filter.

Comment from v_martyanov on 24 March 2024 at 19:18

I have feeling that OSM data contains thousands of buildings, which are AI or optical recognition artifacts. For example this building https://www.openstreetmap.org/way/549392839 has corner, mapped as separate building. And I saw hundreds of them…

Comment from Minh Nguyen on 24 March 2024 at 20:54

Based on the source tag, that building probably came from the French cadaster import. Many government building datasets have errors of this sort because the data collection is based on remote sensing technologies like LiDAR. Cleaning up these errors is the very reason why imports are more difficult than simply loading the data and uploading.

Whether an external building dataset comes from computer vision, machine learning classification, LiDAR, or other automated techniques, data consumers tend to prefer OSM data wherever it’s present because we’ve typically paid more individual attention and performed quality control on it. If you use an automated dataset in your product, you need to filter out low-confidence features or else you wind up with an impressive statistic but lots of junk.

It’s not just buildings. Every now and then, a navigation software vendor gets the bright idea to detect one-way streets automatically based on whether they have telemetry of people mostly going in one direction along the street but hardly in the other. Great – finally solved the problem of routing people the wrong way down a street! Invariably, they have to back away from this approach, because it turns out that many one-way streets don’t have the traffic volume needed to make a confident prediction about the traffic direction. Instead, they get complaints about having to circle around the entire city just to turn right. This data still makes for a great QA tool with a human in the loop, but it’s only a matter of time before someone sees that QA tool and gets a bright idea…

Log in to leave a comment