Three weeks ago, I first talked about connectivity issues in OSM data. More specifically, I introduced an analysis layer showing un- or half-connected islands of the road network, i.e. strongly connected components (SCCs). Read here for more information on SCCs.
A few weeks have passed since the announcement and it is about time to look at the numbers and how they developed over the past weeks. And let me tell you, the numbers are good!
Have a look at the plots. An explanation follows below:
First, let's have a look at the top plot. It gives some information about the development of the total number of SCCs. The green line is this absolute number. We see that it is decreasing quite linearily from 1,008,025 SCCs to 956,849 within 21 says. In other words the raw number has improved by more than 5 percent. The second line in red shows the ratio of how many of these segments are of size one. This usually translates into a oneway street that you cannot enter or leave. This number has slightly decreased from a little more than 60 per cent of all SCCs to a little less than 60. One could argue that this is already a significant improvement in itself. Certainly, a lot of unconnected one-way streets get fixed because they mostly are part of the arterial road network.
Next, let's look at the second plot at the bottom of the figure. It is even more impressive. It visualizes the relative improvement in comparison to the data growth. Everyone knows that the OSM data is ever-growing at a good rate. But how good is this rate? Over the past 21 days, the number of (car) routable way segments has improved from 397,246,075 to 402,225,269 which is more than a per cent of growth. Or in absolute numbers a little than five million edge segments. Let me put this number a little more in perspective for you. This translates into adding enough way segments to model the entire city of Berlin, Germany more than 30 times if I compare against yesterdays extract of the city provided by Geofabrik! And that is a virtually completely mapped city of 3.5 million inhabitants every 17 hours. In other words, a new routable way segment was added somewhere on the planet every 365 milliseconds. This is a bit of comparing apples and oranges from a statisticians point of view, but again: Who says Rome wasn't built in a day?
So, while the raw data size has not only increased, the absolute number of problematic way segments has decreased. This impression also holds true when we look at the relative number of SCCs, which gives us a more detailed idea how the overall routing quality of the data has improved. We look at the orange'ish data line of the bottom plot and see that the fraction of problematic way segments has decreased from 25.4 to about 23.8 basis points. In essence this means that the quality of the OSM has improved while its sheer size made a significant and impressive improvement.
That concludes my brief analysis of the numbers. A great tool to help further improve the data quality is Map Roulette, the notorious OSM Inspector and of course Project OSRMs very own tiny components layer. [update] And of course Gis Lab's Fixme.[/update]
Giving a good estimate on how this will develop in the future is not easy since I can only speculate on the development of the OSM data. I will try to be conservative about the future, but let me say the following. If the data growth keeps its current pace we will look at about 550 million routeable way segments end of next year. And if the improvement of data quality continues just the way it does right now, too, we are heading towards an even greater and even more competetive data set in the months ahead. This is truely amazing!