Three weeks ago, I first talked about connectivity issues in OSM data. More specifically, I introduced an analysis layer showing un- or half-connected islands of the road network, i.e. strongly connected components (SCCs). Read here for more information on SCCs.
A few weeks have passed since the announcement and it is about time to look at the numbers and how they developed over the past weeks. And let me tell you, the numbers are good!
Have a look at the plots. An explanation follows below:
First, let’s have a look at the top plot.
It gives some information about the development of the total number of SCCs.
The green line is this absolute number.
We see that it is decreasing quite linearily from 1,008,025 SCCs to 956,849 within 21 says.
In other words the raw number has improved by more than 5 percent.
The second line in red shows the ratio of how many of these segments are of size one.
This usually translates into a oneway street that you cannot enter or leave.
This number has slightly decreased from a little more than 60 per cent of all SCCs to a little less than 60.
One could argue that this is already a significant improvement in itself. Certainly, a lot of unconnected one-way streets get fixed because they mostly are part of the arterial road network.
Next, let’s look at the second plot at the bottom of the figure.
It is even more impressive.
It visualizes the relative improvement in comparison to the data growth.
Everyone knows that the OSM data is ever-growing at a good rate.
But how good is this rate?
Over the past 21 days, the number of (car) routable way segments has improved from 397,246,075 to 402,225,269 which is more than a per cent of growth.
Or in absolute numbers a little than five million edge segments.
Let me put this number a little more in perspective for you.
This translates into adding enough way segments to model the entire city of Berlin, Germany more than 30 times if I compare against yesterdays extract of the city provided by Geofabrik!
And that is a virtually completely mapped city of 3.5 million inhabitants every 17 hours.
In other words, a new routable way segment was added somewhere on the planet every 365 milliseconds.
This is a bit of comparing apples and oranges from a statisticians point of view, but again:
Who says Rome wasn’t built in a day?
So, while the raw data size has not only increased, the absolute number of problematic way segments has decreased.
This impression also holds true when we look at the relative number of SCCs, which gives us a more detailed idea how the overall routing quality of the data has improved.
We look at the orange’ish data line of the bottom plot and see that the fraction of problematic way segments has decreased from 25.4 to about 23.8 basis points.
In essence this means that the quality of the OSM has improved while its sheer size made a significant and impressive improvement.
That concludes my brief analysis of the numbers.
A great tool to help further improve the data quality is Map Roulette, the notorious OSM Inspector and of course Project OSRMs very own tiny components layer. [update] And of course Gis Lab’s Fixme.[/update]
Giving a good estimate on how this will develop in the future is not easy since I can only speculate on the development of the OSM data.
I will try to be conservative about the future, but let me say the following.
If the data growth keeps its current pace we will look at about 550 million routeable way segments end of next year.
And if the improvement of data quality continues just the way it does right now, too, we are heading towards an even greater and even more competetive data set in the months ahead.
This is truely amazing!
Team OSRM is happy to announce a new data analysis feature for OpenStreetmap data based on OSRMs great routing capabilities. Over the past weeks, we at Team OSRM received a number of complaints that a certain part of the road network was not route-able and were asked for help. We observed that some of these error were not caused by obviously invalid tagging, but by connectivity issues such as unconnected islands of the road network, sources and sinks. Think of the latter two as one-ways where you can drive in but not out, and vice-versa. There are in fact 1,074,201 such way segments today.
For the impatient: head to OSRMs demo site and activate the ‘small components’ layer or use the routing view of Geofabriks great OSM Inspector to see the visualization.
Big thanks go to Geofabrik for hosting the tile layer that we use.
One assumption in reality is that the road network is a strongly connected component. It is a technical term, but in essence it means that you should be able to go from one intersection of the road network to every other intersection [*]. Especially, you should not get stuck in a street that you can’t get out of.
We implemented a standard algorithm for this problem, known as Tarjan’s SCC algorithm to automatically detect all ways that have these problems each time we update our routing data.
Right now, there are about a million way segments planet-wide that bear connectivity issues. While this number seems large at first, about 60% of these problems shall be fixable by editing a single segment only.
The following list will show you the problems that will be detected:
Here, we have a road that looks like a car could drive on it, but it is unconnected to the rest of the road network. Most probably, it is missing a tag that indicates access rules.
Here, a one-way street just ends. Any car that enters this street would be trapped according to the data. This can be caused by faulty tags or perhaps also by missing data. It may be an indicator that there is a street missing.
Here we have a source of traffic. How should a car ever get to the upper-left corner if traffic always goes in the other direction?
There is a limited number of false positives. Either there is an accidental problem in our parsing or on purpose. Most notably, piers. Most ferry connections go through piers although the ferry carries cars. And in the early days of OSRM we allowed them to connect continental main land and islands in the sea.
And of course, someone might have just already fixed the problem.
* * *
[*] Technical Note: we implemented an iterative version of the seminal algorithm.