rivermont's Diary

Recent diary entries

The Size of TIGER

Posted by rivermont on 26 October 2018 in English. Last updated on 29 October 2019.

The Size of TIGER

There is a LOT of TIGER data, most of it still not even glanced at. And each TIGER road comes with a bunch of metadata tags.

way 16543325

Taginfo has the following statistics on common TIGER tags (as of Oct 6 2018):

  • 13,078,000 tiger:cfcc
  • 12,871,000 tiger:county
  • 11,874,000 tiger:reviewed (98% no)
  • 8,021,000 tiger:name_base
  • 6,880,000 tiger:name_type
  • 4,700,000 tiger:tlid
  • 4,700,000 tiger:source
  • 4,020,000 tiger:upload_uuid
  • 4,000,000 tiger:zip_left
  • 3,600,000 tiger:zip_right
  • 3,250,000 tiger:separated (99% no)
  • 1,275,000 tiger:name_direction_prefix
  • 1,127,140 tiger:name_base_1
  • 450,000 tiger:name_direction_suffix
  • 370,000 tiger:name_type_1
  • ~1,020,000 other tags with >20,000 usage

In the OSM XML format, each tag is structured like so:

<tag k="KEY" v="VALUE"/>

where KEY and VALUE are the key/value pair for the tag of course. For example, a simple highway tagged with highway=residential + name=Cole Mill Road + surface=asphalt is:



Excluding all other metadata and node references needed to make an actual way this comes out to 119 bytes, with 34, 34, and 30 bytes for each tag.

Size calculation

Applying this to all of the above TIGER tags, total sizes come out as follows.

  • tiger:cfcc (3-byte value): 380 MB
  • tiger:county (assuming an average value of 12 bytes): 514 MB
  • tiger:reviewed=no: 380 MB
  • tiger:name_base (assume avg. val. of 12 bytes): 345 MB
  • tiger:name_type (2-byte value): 227 MB
  • tiger:tlid (around 200 byte values): 1.13 GB
  • tiger:source (30-byte value): 272 MB
  • tiger:upload_uuid (51-byte value): 334 MB
  • tiger:zip_left (5-byte value): 140 MB
  • tiger:zip_right (5-byte value): 126 MB
  • tiger:separated=no: 104 MB
  • tiger:name_direction_prefix (1-byte value): 56 MB
  • tiger:name_base_1 (assume avg. val. of 13 bytes): 53 MB
  • tiger:name_direction_suffix (assume avg. val. of 2 bytes): 20 MB
  • tiger:name_type_1 (2-byte value): 13 MB
  • ~40 MB other tags

## Conclusion

That all adds up to over 4 GB of data, just from extraneous import tags (I’ll bet NHD imports are even larger … oh boy).
With the current planet.osm (uncompressed) sitting at around 960 GB, all these TIGER tags make up a whopping 0.42% of all OpenStreetMap data! Wow such large.

This doesn’t really conclude much, but it was a fun experiment. I had expected the number to be much larger, but even the vastness of TIGER doesn’t compare to the rest of the world.

Still, most TIGER data is misaligned, low-resolution, incorrectly classified, inconsistent and straight up wrong.
Help us cut down on bad TIGER data!


bad TIGER roads 1

bad TIGER roads 2

Durham Open Data!

Posted by rivermont on 5 June 2018 in English.

I recently discovered the Durham County Open Data site. Licensed under ODbL 1.0, it is a large collection of various data sources from election result data to bike racks. Some of the datasets are a few years old but most of it seems up-to-date.
Hopefully someone will be able to use this; enjoy!
Link to the front page

Location: American Tobacco Historic District, 318, Downtown Durham, Durham, Durham County, North Carolina, 27701, United States