OpenStreetMap

Fixing the rural US

Posted by Richard on 28 January 2015 in English (English)

Folks, I’ve made an exciting discovery. I’ve stumbled across a country on the map which has 46 million inhabitants but is barely mapped. There appears to have been an import several years ago of a poor-quality dataset, and since then it’s languished untouched. There’s no indigenous mapping community. Can we help this poor beleaguered country to get a decent map?

Ok, you may have figured where I’m talking about. It’s the rural US: 72% of the landmass, 15% of the population. And it needs your help.

The problem

For the past couple of months I’ve been using idle moments to address a particular, and very widespread issue in OSM’s coverage of the rural US - the highway=residential problem.

TIGER data, which forms the bedrock of OSM data in the US, classes roads by CFCC - ‘Census Feature Class Code’. By far the most prevalent is A41 - “Local, neighborhood, and rural road, city street, unseparated” - and the TIGER import translated this to highway=residential. This was a good fit in urban areas but covers a multitude of sins in rural areas - everything from good, fast state highways to rutted forest tracks or worse.

The effect is that our map of the rural US shows pretty much everything, save the biggest roads, as a residential road. Tarmac road with sweeping curves and a painted centreline? highway=residential. Gravel road? highway=residential. Forest track? highway=residential. Vague two-foot clearing through the woods where someone perhaps rode 50 years ago? highway=residential. Ploughed field? Etc, etc.

Rural crossroads

This is a typical example from an agricultural area. A good-quality road with centreline, running east-west: a smaller access road, running north: and nothing at all running south. In OSM, this is mapped as a crossroads with highway=residential roads in all four directions. Having the edge of a ploughed field marked as highway=residential doesn’t make for a great map, nor does it make for good routing results. “In 100m, turn left.” “BUT THAT’S A FRICKING FIELD YOU ACCURSED MACHINE.” Sigh.

Rural crossroads

But if you really want some fun, find a less cultivated area - this forest, for example. Look at all those lovely residential roads, tagged exactly the same as a paved city street. Except none of these are paved. A few might be gravel. Many don’t really appear to exist at all.

Most of this remains unchanged. In some areas a dedicated user has cleared it up and there’ve been a few energetic nationwide editors, but it’s a massive job. It’s pretty much endemic - even just a few miles from San Francisco, a hotbed of OSM activity, you’ll find examples.

Perhaps we shouldn’t have imported TIGER in these rural areas, but just let the map grow at its own pace. That way, the important roads would have been surveyed, traced, or imported one-by-one, and the thickets of near-impenetrable tracks would probably have never made it in. But we are where we are, and though I’m generally sceptical of “armchairing” far-flung data, we have a big heap of flat-out wrong data and no other strategy to deal with it.

A framework for fixup

So I’ve been fixing up roughly along these lines, though obviously adjusting for local sensitivities and network considerations:

  • highway=tertiary - paved 2-lane road with painted centreline
  • highway=unclassified - other paved road
  • highway=unclassified, surface=unpaved - unpaved road (at least a car’s width, consistent surface)
  • highway=track - unpaved, often doubletrack/singletrack
  • highway=service - access to private house or farm
  • (delete entirely - no trace of trail/road)

Most of this is fixable from imagery. There are also some good datasets: Arnold, Forest Service data, various state data, etc.

In forests 90% of highway=residential should really be tracks. In the plains, the majority is either track or unpaved road, often in grids, but with the occasional paved through route. In Missouri, Tennessee, Kentucky and eastwards you start to see more paved roads.

Personally, my main priority has been to identify and retag paved through routes. Often these can be identified by squinting at the map: a river bridge is a tell-tale indicator, or a road with wide curves, or one linking settlements. Sometimes you just need to look at the aerial imagery and pan around. Of course, it’s not just the highway tagging that needs fixing - ref tags and geometries would benefit from attention, too - but you can’t do everything, so my chosen challenge has been to get the tagging sane.

I just use plain vanilla Potlatch 2 for fixup, accelerated by assigning common tags to function keys. One day it’d be nice to build something MapRoulette-like to tackle the issue, a bit like HotOrNot (TrackOrCack? HighwayOrLieway? RoadOrFOAD?). But for now a normal editor does fine.

So if you’re sitting in your armchair with an itchy OSM finger, resist the temptation to normalise some tags or trace some buildings or whatever else you might usually do. Come and fix up the rural US.

Comment from Linhares on 28 January 2015 at 13:54

I like your classification. Here in Brazil we also have many unpaved roads that are not tracks.

Comment from jumbanho on 28 January 2015 at 15:59

I think you’re mainly right on with the need for this task. Especially finding the important through roads and upgrading their status and making sure that driveways serving particular homes should be marked as service. However, and the classification of many of these roads has been debated several times in the US lists so this isn’t necessarily a consensus, but I’ll put my two cents in here about a subset of these roads. In some midwestern states (MN, WI, IA at least), there are two lane gravel roads in rural areas that are used by residents to travel to and from their residences. I do think these should be tagged as residential with a surface=unpaved (or gravel).

Comment from Omnific on 28 January 2015 at 15:59

This is how I got started in OSM. I used an offline maps app to route me through West Virginia and it decided to send me down a dirt track. After that, I did a ton of work improving West Virginia, an area with almost no mappers and terrible import quality. I’ve also done the same in rural counties in North Carolina and South Carolina. The quality of data in a lot of counties is horrendous, so I applaud any people working to improve it.

Comment from giggls on 28 January 2015 at 16:19

Data tends to be “horrendous” in countries with huge imports. In Germany our data has been acquired without any noteworthy imports any features mostly local knowledge.

Especially the latter is the key for good quality OSM data.

Sven

Comment from Richard on 28 January 2015 at 17:56

@jumbanho: Yes, I can see the logic in that. Essentially, as long as unpaved roads are tagged as such, that’s great. (What’s a little more worrying, and something I’ve encountered several times, is when people retag these gravel roads as highway=unclassified without adding a surface tag.)

Comment from Minh Nguyen on 28 January 2015 at 19:56

@Omnific: West Virginia is a state with poor data quality. It hasn’t gotten much better over the years because Yahoo! and Bing imagery was always very poor in this area. Now with USDA and Mapbox imagery there’s an opportunity for better armchair mapping. Every time I accidentally find that my editor is in Appalachia – Eastern Kentucky and Southeast Ohio are just as bad – I have to budget a few extra hours for the inevitable roaming realignment party.

Comment from jumbanho on 28 January 2015 at 20:22

If you want really good imagery and a state that still needs lots of fixing, try out North Carolina. There is leaf off imagery that is no more that 4 years old. It is a preset in JOSM. Pick one of the 100 counties (probably 80 largely rural ones) and start fixing.

Another suggestion for those who haven’t tried, the Strava heat maps are super useful for correcting geometry when trees obscure the roads. See Strava wiki page

Comment from lxbarth on 28 January 2015 at 22:12

Richard - thanks for posting.

Over here in the Mapbox team we’ve been mostly focused on fixing alignment problems via the micro tasking manager to-fix (background).

We’re using updated TIGER data as a guide for editing, one weakness of the original TIGER data is the lack of classification granularity in the lower level road network - exactly the reason why we have this mess in OSM now. Just talked to Eric Fischer over here and we want to look into whether by surfacing the existing road classification in TIGER comparison layers better we can at least focus retagging efforts better by knowing where TIGER does / doesn’t provide useful classification information.

I like the framework you’re suggesting for retagging.

Some other useful tools for fixing TIGER:

Comment from Timothy Smith on 29 January 2015 at 06:48

To Fix is a great way to cleanup bad data. Before that I would just roam Oregon looking for bogus residential roads. It’s a lot easier with the TIGER cleanup in To Fix. You can spend hours being shot all around the US to cleanup horrible import roads.

Comment from Richard on 29 January 2015 at 10:37

@Alex - interesting, and good to see Mapbox tackling this.

Have you thought about running an automated comparison with Arnold (as you blogged at http://openstreetmap.us/2014/12/arnold-for-osm/)? I’d expect that any road showing on Arnold is of reasonable quality; if the corresponding geometry in OSM is tagged with highway=residential, tiger:reviewed=no, that suggests a priority for retagging.

One example I found just by zooming into Eric’s Arnold rendering: here in WV there’s a highway=residential which has moderate traffic flows and should really be a tertiary.

Comment from Timothy Smith on 30 January 2015 at 07:45

Also if a road goes to a house you should probably use driveway to be a bit more specific.

http://wiki.openstreetmap.org/wiki/Tag:service%3Ddriveway

Comment from lxbarth on 30 January 2015 at 20:34

@RichardF - yup, thought about it… but: what we have of ARNOLD right now is pretty much what HPMS (attributes) is using of ARNOLD (geometry) - which isn’t much. OSM or TIGER coverage is much much bigger than HPMS, and ARNOLD itself is not available for download, I think. Let me check on the latter, actually.

Comment from lxbarth on 30 January 2015 at 20:36

One example I found just by zooming into Eric’s Arnold rendering: here in WV there’s a highway=residential which has moderate traffic flows and should really be a tertiary.

And y, we should be able to detect these types of issues w/ HPMS/ARNOLD.

Comment from Nakaner on 1 February 2015 at 20:02

Hi,

Perhaps we shouldn’t have imported TIGER in these rural areas

The history (especially German speaking countries) tells us that an empty map makes a community grow and only countries with a large, geographically widely spreaded community have good and up-to-data data. I think it would be best if OSM carto would not to render objects anymore which are tagged with tiger:reviewed=no. After this change people will miss these streets at osm.org and every place where osm.org tiles are used. People will start removing this tag from objects they want to see at osm.org[1]. By removing this tag, they will review the data and improve it. It would happen where people need the data[2].

The advantage of this solution is that data user who want to have all OSM data, just can ignore tiger:reviewed=no, i.e. Mapbox, Mapquest etc. would not be affected. If you would remove all TIGER data from OSM which has not been reviewed yet[3]. As in solution 1, people would remap all areas which they need but data users like Mapbox would suffer from missing streets.

Maybe, I create a map called “The Better Map” on day which does not render data whose last modifier is an import user (or tagged with tiger:reviewed=no). U.S would became white but this would be better than rendering nonsense. :-)

Best regards from a country where OSM is used because it is better, not because it is free.

Michael

[1] A lot of mappers map a feature because it is rendered at osm.org. Let’s benefit from this effect. [2] In Germany, people started to map where they were or where they needed a map. [3] I agree this too but I know that this will never happen. :)

Comment from lxbarth on 3 February 2015 at 16:31

I think it would be best if OSM carto would not to render objects anymore which are tagged with tiger:reviewed=no.

Interesting idea…

Comment from n76 on 15 March 2015 at 02:14

Don’t follow diaries very much but saw a link to this on the talk-us mailing list.

I agree with your road classifications with one exception: If there are “a reasonable number of what look like houses” along the road (judgement call) and there is a name on the way then I leave the tag as “residential”.

My way of finding them in places like the desert of Southern Arizona is to simply look at the map displayed at http://www.openstreetmap.org/ and see if it shows roads. If it does, chances is that it is in error. They are probably ranch or mining tracks or the disturbed earth over a pipeline (which also might be a track). Zoom in and with satellite imagery go to town. A fair chunk of southern Pinal County has been cleaned up that way.

The other way to find things more locally to where I currently live is to make a topo map (my own scripts) of an rural area to hike at: Most of the “residential” roads show up like a sore thumb and I can survey the area on the drive to or from or while hiking.

Do look at http://184.73.220.107/battlegrid/ for another way to find suspect areas, though seems that is more focused on finding new roads and developments.

Comment from Richard on 16 March 2015 at 09:23

Yes, =residential is still a good value for that sort of road. I think the key there is to take the tiger:reviewed tag off (so it’s clear it’s not raw TIGER) and to make sure there’s a surface tag if the road is unpaved.

Login to leave a comment