TIGER road clumps

Posted by Matt_ on 19 October 2023 in English.

I pulled together some info I wanted to share: what is the longest connected group of untouched TIGER imported roadways in the U.S.? The answer is this 1258.6km group of ways in Brooks County, Texas. And then below will be the rest of the info I found. I haven’t done anything with it, I just wanted to share.

Note: the Overpass queries I link directly to are all relatively quick since they’re just lists of way IDs. But for some of them I link to pastebin since they were too long to pass in a url.

Overpass Turbo - Brooks County, TX Results from Brooks County, Texas


The full results are here as a spreadsheet but I also want to explain.

I was looking at the results of the query for unedited ways and nodes in an area near me and noticed there were way more untouched roads than I thought there would be.

Example from a city not actually near me

All the roads I’ve looked at in my area were pretty detailed and accurate to my eye. But then I noticed the unmodified ways were mostly separate from each other, only connected by roads that HAD been edited in the last 15 years.

So a large number of unedited roads in an area can mean they were good to begin with? As long as they’re not still touching at least?

Then, where is the other extreme?


So what is the farthest distance you can travel only on mapped roads unmodified since about 2008? The answer is it depends on the state you’re in.

Basically I ran the “TIGER unmodified ways and nodes” query for each state and wrote a program to go through the results and group ways by connections. There is probably an easier and smarter way to do it but I couldn’t find it and so I got to have fun creating my own. I can share the code if anyone cares but it isn’t very clean.

I also ran the same query as before again with one change. I replaced the last line with:

.result2 out geom;

This returns just the nodes connecting two or more ways. I used this info to ignore all of the nodes only belonging to a single way, which greatly sped up my horribly inefficient program trying to match 10s to 100s of thousands roads to each other by common nodes.

Once grouped by connections, I calculated each group’s total length using the latitude and longitude of their nodes.

Another note: this data does not include any individual ways, only connected groups of at least two. Also I made no attempt to find connections across state lines.

Finally, the data

Here’s everything I found in spreadsheet form. The file is about 5MB. Columns are: state, number of connected ways in the group, total length of ways, and then a comma-separated list of way IDs you can view by pasting them into the following Overpass query:

out geom;

There were no results for RI, PA, MA, HI, or DC. Also if you want to replicate this, don’t query all of MO, OK, TX, or VA in one go. You’d probably need to split up AL, KS, CA, and NM too. There are too many results. I assumed built in limits would stop me from hurting anything by trying to run too big a query. If that’s wrong, I’m very sorry for taxing the Overpass server when I ran all of this.

Some of the results visualized:

The 10 largest (longest) groups in each state:

Query text on Pastebin // Higher res Lower 48 top 10 groups per state

Every group longer than 100 km:

Query text on Pastebin // Higher res Lower 48 everything longer than 100 km

Between 10 and 100 km:

This query was too long even for Pastebin… Higher res Lower 48 everything longer than 10 km but shorter than 100

Everything 10km and up in blue and 100km and up in orange using mapshaper:

Higher resolution Lower 48 everything 10 km and up

One 535km group in Rawlins County, Kansas that is clearly visible on satellite and at a quick glance looks ok but I haven’t checked how the tags on it are:

Overpass Turbo - Rawlins County, KS Results from Rawlins County, Kansas

A 61km group in Greenbrier County, West Virginia in a forested area that could still be spruced-up? (pun intended, but I don’t actually know if this is a an example of needing work or not)

Overpass Turbo - Greenbrier County, WV Results from Greenbrier County, West Virginia

I hope some of this was helpful. Thank you for reading!


Comment from n76 on 19 October 2023 at 16:05

Nice work!

I think this will give some mappers the goal of focusing on those areas.

Comment from Matt_ on 20 October 2023 at 01:17

Thank you!

Comment from SK53 on 20 October 2023 at 17:11

If you want to avoid coding it is possible to achieve similar results with a Cluster DB Scan function. Unfortunately, QGIS only offers one which works with points. A (not so*) quick method as follows:

  1. Download geojson from Overpass Turbo.
  2. Upload geojson into PostGIS (I go via QGIS)
  3. Run something like the following query in the QGIS DB tool (The parameters are distance within which objects should be clustered, and I used 0 for touching objects, 1 for minimum number of objects in cluster. ):

    SELECT ST_ClusterDBSCAN(way,0,1) over() cluster_id, osm_id, highway, name, ref, way FROM raw_tiger_roads

  4. Visualise the clusters by categorising them by cluster_id modulo (cluster_id % 29 was what I used).

Raw TIGER roads, Pennsylvania

Raw TIGER roads, showing a few clusters

You can do lots more processing on the geometries in QGIS (collect, sum lengths, counts, buffer, hulls etc), and once you have found what is useful these can be pushed upstream into the source SQL query.

  • Not so quick because I’m running low on disk space.

Comment from watmildon on 20 October 2023 at 17:48

Wonderful! It reminds me a of the very engaging water basins map that Amanda has: Except, in this case, the urge is to break these big blobs up!

Comment from Matt_ on 20 October 2023 at 20:31

Thank you all, I’m learning a lot!

Comment from SK53 on 21 October 2023 at 08:51

@watmildon: Cluster DBScan works just as well for waterways, and a host of other interesting problems with OSM data! I really need to write up some of the other uses.

Comment from stevea on 29 October 2023 at 16:08

Really awesome work to reduce TIGER noise, Matt, thank you! I’m glad to see comments and suggestions from others, too.

It might sound simplistic, but one thing that I’ve found can be a helpful strategy is to have (or make) a county-wide wiki page (can be linked from one of the 50 state wikis we already have) and therein provide a link to an Overpass Turbo (OT) query that identifies all tiger_reviewed=no nodes and ways in the wiki (like my county wiki does, see here,_California#Work_to_be_done_in_the_County); webpage text-search for Overpass.

OT’s geocodeArea:County Name}}->.searchArea; directive makes specifying “where” pretty easy — that countywide OT search is only 10 lines long and has a generous timeout of 60 seconds which is never reached. True, it does take (usually local OSM volunteers) to do the work once the data are identified, but that’s always the case for TIGER cleanup.

I’ve estimated it might take until the late 2030s or 2040s to fully clean up our TIGER noise, but over the last 15 years, it’s clear we are doing a steady job of bringing it in for a landing. Smarter strategies like those identified here will only help!

Comment from supersellout6907 on 30 October 2023 at 16:35

I know that the areas I’ve been working on have a lot of TIGER noise.

Login to leave a comment