Some Notes from Analyzing OSM US History Data

Posted by dalek2point3 on 16 June 2014 in English. Last updated on 28 June 2014.

The Data

Today, I’ve finally gotten a chance to play with data from the OSM History extract that I created using the parser that I wrote about last time.

This is what the data contains:

Every node that contains either an amenity, addr:housenumber or place tag.
For each node, I record basic metadata, “name” and gnis* tags
Every way that contains either an amenity, highway, building or parking tag.
For each node, I also record basic metadata and the following tags:
- name
- tiger:cfcc, tiger:county, tiger:reviewed
- access
- oneway
- maxspeed
- lanes

The resulting flatfiles are large, partly because I’m parsing the “history” data, so I’m including every past version of every node and way, in addition to the most current version. There are about 6.3 million node entries and about 48.7 million way entries. For each of these nodes and ways, I ran them through my point-in-poly program to code the county and the MSA that each way / node lies in.

The next big step was to drop imported data. I really dont care about this – obviously this includes data from the TIGER import but also many other major edits in the US. Interpret the numbers below as the contribution of OSM editors – but major national level imports. I’ve not removed smaller county level imports, because I see them as being relevant to my analyses, but also because they’re harder to pin down. So the data includes any way, node with the relevant tag touched by a non-TIGER (and some other importer accounts).

Some Highlights

Way data

How many items do we have for each of the 4 types? * Highway = 15.9 million versions, 7.9 million uniques * Building = 6 million versions, 4.9 uniques * Amenity = 460k versions, 331 uniques * Parking = 52.6k uniques, 66k versions, * All data = 22.3 million versions, 13.1 million uniques

Imgur

Other notes: * About 5.8 million of the 15.9 million way versions have a tigercfcc / tigercounty tag.

Node data (pending)

NOTE: this post is under construction

Location: MIT, Cambridge, Middlesex County, Massachusetts, United States

OpenStreetMap

dalek2point3's Diary