
rivermont's Diary
Recent diary entries
The Size of TIGER
Posted by rivermont on 26 October 2018 in English (English). Last updated on 29 October 2019.The Size of TIGER
There is a LOT of TIGER data, most of it still not even glanced at. And each TIGER road comes with a bunch of metadata tags.
Taginfo has the following statistics on common TIGER tags (as of Oct 6 2018):
- 13,078,000
tiger:cfcc
- 12,871,000
tiger:county
- 11,874,000
tiger:reviewed
(98%no
) - 8,021,000
tiger:name_base
- 6,880,000
tiger:name_type
- 4,700,000
tiger:tlid
- 4,700,000
tiger:source
- 4,020,000
tiger:upload_uuid
- 4,000,000
tiger:zip_left
- 3,600,000
tiger:zip_right
- 3,250,000
tiger:separated
(99% no) - 1,275,000
tiger:name_direction_prefix
- 1,127,140
tiger:name_base_1
- 450,000
tiger:name_direction_suffix
- 370,000
tiger:name_type_1
- ~1,020,000 other tags with >20,000 usage
In the OSM XML format, each tag is structured like so:
<tag k="KEY" v="VALUE"/>
where KEY
and VALUE
are the key/value pair for the tag of course. For example, a simple highway tagged with highway=residential + name=Cole Mill Road + surface=asphalt
is:
```
```
Excluding all other metadata and node references needed to make an actual way this comes out to 119 bytes, with 34, 34, and 30 bytes for each tag.
Size calculation
Applying this to all of the above TIGER tags, total sizes come out as follows.
tiger:cfcc
(3-byte value): 380 MBtiger:county
(assuming an average value of 12 bytes): 514 MBtiger:reviewed=no
: 380 MBtiger:name_base
(assume avg. val. of 12 bytes): 345 MBtiger:name_type
(2-byte value): 227 MBtiger:tlid
(around 200 byte values): 1.13 GBtiger:source
(30-byte value): 272 MBtiger:upload_uuid
(51-byte value): 334 MBtiger:zip_left
(5-byte value): 140 MBtiger:zip_right
(5-byte value): 126 MBtiger:separated=no
: 104 MBtiger:name_direction_prefix
(1-byte value): 56 MBtiger:name_base_1
(assume avg. val. of 13 bytes): 53 MBtiger:name_direction_suffix
(assume avg. val. of 2 bytes): 20 MBtiger:name_type_1
(2-byte value): 13 MB- ~40 MB other tags
## Conclusion
That all adds up to over 4 GB of data, just from extraneous import tags (I’ll bet NHD imports are even larger … oh boy).
With the current planet.osm
(uncompressed) sitting at around 960 GB, all these TIGER tags make up a whopping 0.42% of all OpenStreetMap data! Wow such large.
This doesn’t really conclude much, but it was a fun experiment. I had expected the number to be much larger, but even the vastness of TIGER doesn’t compare to the rest of the world.
Still, most TIGER data is misaligned, low-resolution, incorrectly classified, inconsistent and straight up wrong.
Help us cut down on bad TIGER data!
TIGER Gore
The City of Raleigh, NC has also released data on ArcGIS. http://data-ral.opendata.arcgis.com/datasets
I recently discovered the Durham County Open Data site. Licensed under ODbL 1.0, it is a large collection of various data sources from election result data to bike racks. Some of the datasets are a few years old but most of it seems up-to-date.
Hopefully someone will be able to use this; enjoy!
Link to the front page
OSM Links
An open list of useful resources relating to OpenStreetMap.
About
This repository aims to be a community-maintainted list of resources regarding OpenStreetMap. The list can contain anything from materials for contributors, guides for newcomers, etc.