Today, I’ve finally gotten a chance to play with data from the OSM History extract that I created using the parser that I wrote about last time.
This is what the data contains:
- Every node that contains either an amenity, addr:housenumber or place tag.
- For each node, I record basic metadata, “name” and gnis* tags
- Every way that contains either an amenity, highway, building or parking tag.
- For each node, I also record basic metadata and the following tags:
- tiger:cfcc, tiger:county, tiger:reviewed
The resulting flatfiles are large, partly because I’m parsing the “history” data, so I’m including every past version of every node and way, in addition to the most current version. There are about 6.3 million node entries and about 48.7 million way entries. For each of these nodes and ways, I ran them through my point-in-poly program to code the county and the MSA that each way / node lies in.
The next big step was to drop imported data. I really dont care about this – obviously this includes data from the TIGER import but also many other major edits in the US. Interpret the numbers below as the contribution of OSM editors – but major national level imports. I’ve not removed smaller county level imports, because I see them as being relevant to my analyses, but also because they’re harder to pin down. So the data includes any way, node with the relevant tag touched by a non-TIGER (and some other importer accounts).
How many items do we have for each of the 4 types? * Highway = 15.9 million versions, 7.9 million uniques * Building = 6 million versions, 4.9 uniques * Amenity = 460k versions, 331 uniques * Parking = 52.6k uniques, 66k versions, * All data = 22.3 million versions, 13.1 million uniques
Other notes: * About 5.8 million of the 15.9 million way versions have a tigercfcc / tigercounty tag.
Node data (pending)
NOTE: this post is under construction