amb_santacruz's Diary

The history and completeness of OSM

Posted by amb_santacruz on 4 December 2015 in English.

Chris Barrington-Leigh and I have been working for the past year to examine the history and completeness of the OSM road network. We’re interested to hear your thoughts and reactions, particularly about why completeness varies so much across countries.

Our rough estimate, to be refined, is that the world’s roads are now over 90% complete. It’s not only European and North American countries that seem complete, but also lower-income countries such as Haiti (presumably thanks to the Humanitarian OSM Team).

OSM completeness

You can see the country-by-country history, along with estimated saturation points, here. The y-axis indicates the length of ways; the scale varies depending on the country.

A few notes on the methods:

we are only looking at completeness in terms of length.
the data are for roads only (i.e., ways tagged “highway-“ and one of the following values: “motorway,” “motorway_link,” “trunk,” “trunk_link,” “primary,” “primary_link,” “secondary,” “secondary_link,” “tertiary,” “residential,” “road,” “unclassified,” or “living_street”).
we used two methods: we modeled the shape of the S-shaped curve for each country, and used satellite imagery to count missing road segments for a random sample of grid cells.
we’ll release the code as soon as we clean it up a little more.

We’re excited to share these preliminary results, and hope to get your thoughts.

Adam & Chris

Location: University, Santa Cruz, Santa Cruz County, California, 95064, United States

Discussion

Comment from imagico on 4 December 2015 at 20:12

Without knowing the details of the methods used not much can be said here but it seems unlikely you can make such estimates without pretty hairy assumptions regarding the distribution of roads in countries (both spatial and in terms of road types) or the pattern how roads are mapped in OSM.

That being said the results seem to overestimate completeness in many cases, especially for larger countries with limited and localized mapping, probably because you interpret saturation effects in urban road mapping as a sign for overall completeness.

The good thing about this is that overestimation of completeness is going to be much easier to falsify. So if you stay with the 90% completeness estimate you are likely to be proven wrong relatively soon…

Comment from mikelmaron on 4 December 2015 at 20:50

My understanding is that the methodology is based on random sampling and visual inspection, not on assumptions of distribution of roads. We should be able to investigate more once the code is opened. I agree, 90% sounds a bit high overall, but it is an estimate after all (not something true or false). Many of the numbers align but not completely with https://www.mapbox.com/blog/how-complete-is-openstreetmap/ (where the completeness numbers are comparison against CIA World Factbook), so will be an interesting comparison The trajectory graphs are helpful in a lot of cases to spot some dynamics in communities.

Comment from amb_santacruz on 4 December 2015 at 21:04

Thanks for the comments. The point about saturation in urban roads is a good one.

We do cross-check, as Mikel says, against a visual assessment using a small sample (20 grid cells per country) of satellite imagery. For now, we only use the estimated completeness from the S-shaped curve (i.e., its asymptote) where it agrees with the visual assessment (i.e., it falls within the 95% confidence interval). This should mitigate most cases where growth in OSM road length has saturated but the network is still far from complete.

Comment from imagico on 5 December 2015 at 00:03

I can’t say much about the visual inspection without knowing the exact procedure but there are a lot of things you can do wrong here and introduce bias. Since many countries do not have full high resolution satellite image coverage available i don’t even think random sampling is possible.

And you always have other systematic errors when doing assessment based on satellite images. For example in heavily forested areas (like Brazil, Canada, Russia) you underestimate the actual number of smaller roads in rural areas so you overestimate completeness.

Cases of probably quite significant overestimates are for example Libya and Chile.

Comment from tgertin on 5 December 2015 at 17:17

This is an interesting study…

are you using high-resolution satellite imagery?

How big is a grid cell?

Comment from joost schouppe on 5 December 2015 at 19:36

From what I understand, you look at the shape of the growth curve of the road network. The visual inspection is just a crosscheck. Here’s an example of the curve I’m guessing you use, from Flanders, Belgium.

growth

As you see growth flatten out over the years, you can make the assumption that the road network is “complete” in the sense of having the geometry of almost all roads.

A couple of things to keep in mind:

this only holds true if the community is large enough. Growth will also peter out if there are not enough good sat pics left to map or no more people willing to map. I just wrote in my diary about how even in a large community like Flanders, 44% of all nodes were mapped by just one guy. Now imagine how things might go in a place like Bolivia. It might show up in the data as a certain shakiness of the graph, but you do have a number for all countries.
as you can see in my example, one road isn’t the other. in Flanders, the growth curve for main roads (tertiary and up) is quite similar to the minor roads (road, unclassified, residential, living street). But that might not always hold. For example, in Africa, a correction of tagging might distort numbers when a lot of tracks suddenly become main roads.
I used the Chile map in real life. I don’t think I found any road at all that was missing there. But that brings me to something else: imports (as Chile did). They obviously distort the curve. I would imagine that the road network in the US also looks quite “complete” since many years, as the huge Tiger cleaning operation won’t have much effect on overal network length. Completeness does no necessarily implicate quality; making the measure most useful in countries without a large import.

I’d be very interested to hear more on how you went about this analysis.

Comment from joost schouppe on 5 December 2015 at 19:45

Link kaput, and you can’t edit your comments. Hope this works: http://imgur.com/2hxAXNk

Comment from amb_santacruz on 7 December 2015 at 18:46

Thanks for all the additional comments. That’s interesting to hear about Flanders and Chile, and the point about the lack of good satellite imagery remaining is well taken.

One question: which countries do you think are the best mapped (especially outside Europe)? What about countries like Haiti and Nepal, where HOT has done so much work? What others?

A few clarifications:

We use the Google hybrid OpenLayers plugin within QGIS for the imagery. The Google Maps layer provides some additional information, but we used the imagery as the primary source, as the Google data may not be complete either. So the resolution for the visual assessment is whatever Google offers.
each grid cell is about 2.5 square km, although due to the projection used it varies with latitude
we do exactly what joost suggests - we look at the shape of the growth curve of the road network, and crosscheck with the visual inspection.
we model imports through allowing for jumps in the growth curve. You can see the jumps in the predictions here. Saudi Arabia is a nice case in point.

Comment from joost schouppe on 12 December 2015 at 17:00

I can compare Ecuador, Peru, Bolivia, Argentina and Chile from driving around there. I would say Argentina and Chile both have great maps, especially when it comes to completeness. But Argentina might be a more interesting case for you, as I think they didn’t have a large import. Ecuador is also pretty good. Peru and Bolivia have a lot of work left.

Comment from amb_santacruz on 10 January 2016 at 22:42

Thanks, Joost, for these ideas. I wanted to give a quick update. Based on all these comments, we are going to spend some more time doing more visual inspections. In particular, we are going to sample more systematically across the entire rural-urban gradient. We hope to have more to share in the next month or two.

Comment from amb_santacruz on 13 August 2017 at 03:15

Thanks again to all of you for these ideas about how to improve and extend our analysis. We did quite a bit of additional work to sample lower-density areas, and refined our process to fit S-shaped curves. We conclude the at the global level, OSM was ~83% complete as of January 2016. Rerunning the models with the April 2017 OSM data give an estimated completeness of ~89%. (These figures are for streets only, and don’t say anything about other features included in OSM.)

The final paper is now published, and available here. The code and supporting information is posted to GitHub. We look forward to others improving on our effort.

OpenStreetMap

amb_santacruz's Diary

The history and completeness of OSM

Discussion

Log in to leave a comment