mvexel's Diary

OpenStreetMap “mappedness” based on POI density

Posted by mvexel on 10 April 2019 in English.

How well mapped is a town in OSM? There are many ways to answer that question depending on how you define “well mapped-ness” (or just mappedness). One simple approach is to look at the number of points of interest (POI) mapped. My assumption is that POI are a higher order feature: mappers establish a base layer of roads first, then move on to enriching the map with POI. So having more POI in an area could be a proxy for mappedness. A benefit is that it’s easy to measure. All you need is the POI data from OSM, and town / city boundaries. I took a stab at it. Here’s a preview:

Prepare the data

Getting POI

We need POI from OSM as the quality indicator. As a proxy for POI, I use nodes that have a name tag. This lets me sidestep the curious OSM attribute hierarchy that would require selecting subsets of tourism, shop, amenity and a few other attribute keys from the data. There’s two ways to go about this. For smaller areas you could use Overpass, using a query like this one:

area[name="Salt Lake City"]->.a; node[name](area.a); out meta;

For larger areas, it’s better to work from an OSM PBF data file, which is the route I took. Download one from your favorite source, like BBBike or Geofabrik. Then use osmium to filter:

osmium tags-filter utah-latest.osm.pbf n/name -o utah-namednodes.osm.pbf

Regardless whether you took the Overpass or PBF route, you now have an OSM file with named nodes.

Place boundaries

How you can get place / city boundaries will depend on where you are. In some areas, OSM itself may be a great source, check Wambacher’s boundaries site. In the U.S., official place boundaries are easily available from Census, which is where I got them.

Loading the data

To do the needed spatial operations, you could use QGIS, but I like PostGIS. So I load the OSM and place data into a spatial datbase.

POI

First I prepare an OSM ‘snapshot’ database:

createdb osm && psql -d osm -c create extension hstore && psql -d osm -c create extension postgis && psql -d osm -f pgsnapshot_schema_0.6.sql

The pgsnapshot_schema_0.6.sql file is part of the osmosis distribution and contains the database schema. Use osmosis to load the data:

osmosis --rb utah-namednodes.osm.pbf --wp database=osm

Places

The TIGER place boundaries come in shapefile format so I use shp2pgsql to load those into a separate table:

shp2pgsql -s 4326 -D -I tl_2018_49_place.shp > places.sql && psql -d osm -f places.sql

Okay, data ready. Now for the processing.

Querying the data

What we want is a measure of POI density. We need two things for each place boundary:

The number of POI contained in the place boundary
The area size

Let’s add two columns to the place boundary table to hold that information, and a third column to hold the density result:

alter table tl_2018_49_place add column poicount integer; alter table tl_2018_49_place add column areakm2 float; alter table tl_2018_49_place add column poidensity float;

With a smarter query, you could skip the two intermediary tables.. Now let’s fill the columns with the data we need:

update tl_2018_49_place set poicount = (select count(o.*) from nodes o, tl_2018_49_place p where st_within(o.geom, p.geom) and p.name = tl_2018_49_place.name); update tl_2018_49_place t set areakm2 = st_area(st_transform(t.geom, 2850))/1000000; update tl_2018_49_place t set poidensity = poicount / areakm2;

I use a meter-based projection appropriate for the area so I can get the area size in km^2, but strictly that is not necessary because we’re looking for relative results.

Visualization

For good measure, I loaded both tables into QGIS, split the density values into quantiles, and used those buckets for visualization of the results, see above. As expected, the cities and towns on the Wasatch Front, where we have a fair number of active mappers, score well in mappedness, whereas smaller towns in rural areas generally score poorly. My QGIS project file is here.

In a follow up post, I will go into how the result could be improved. I would appreciate your feedback as input for that post.

Discussion

Comment from amapanda ᚛ᚐᚋᚐᚅᚇᚐ᚜ 🏳️‍🌈 on 11 April 2019 at 08:06

Nice. “Just use things with a name” is a good rule of thumb. I’d wonder about PoIs mapped as ways, which can often be a sign of more detailed mapped (replacing a shop mapped as a point with an area), but then you have to filter out roads.

For km² calculations I like PostGIS’s geography type. ST_Area(t.geom::geography) gets you m² areas on a sphere.

Comment from mvexel on 11 April 2019 at 16:25

Thanks for the tip re the geography type cast, that is clever.

POIs mapped as ways would be good to include, but would make things a little more complicated. Fortunately you don’t need the actual way geometries which reduces osmosis load time in PG. You would need to filter by closed ways only, I don’t know if osmium lets you do this?

Comment from Carnildo on 11 April 2019 at 22:19

Straight density doesn’t strike me as being an effective measure of “mappedness”: in a perfectly-mapped area, it should closely reflect population density.

Estimating the expected number of POIs should give a more accurate measure. For example, the ratio of the number of POIs to the length of roads, or the ratio of POI density to the above-mentioned population density.

Comment from Jochen Topf on 20 April 2019 at 16:05

Osmium can filter by area (closed ways plus multipolygon/boundary relations): osmium tags-filter ... an/name ... should work.

Comment from Discostu36 on 20 April 2019 at 18:00

I’d be interested in statistics that can’t be generated with this method: Different locations with high level of detail. Where the difference is if trees, streetlights and building levels are mapped or not.