Using OSM to improve government data

Posted by joost schouppe on 30 September 2016 in English (English). Last updated on 4 October 2016.

Recently, I wrote about how you could use government road data to improve OpenStreetMap. Here’s a move in the other direction.

As an employee of the city of Antwerp, I was involved in the recent ‘validation’ of the Road Registry (Wegenregister) for our city. This registry is managed by the central Flemish government, but final responsibility for the content is with the municipality. Validation means the central government gives us a new dump for us to check for errors. This way of working is only a temporary situation: in the future, we will be live editing in the central database itself.


Some background

There’s an amazing amount of cleanup left to do, but we decided to focus on the completeness of the main road network. Before, we did this by comparing with our own city registry of roads. But that is not being updated anymore. So for the first time, we used OpenStreetMap for the validation. Using FME, we identified roads which exist in OSM, but not in the Road Registry. We excluded service roads and “slow roads” (paths, tracks, cycleways), as these are less of a priority right now.

Next time, we will also look at roads that are in the Road Registry, but not OSM. In some case, the lack of road in OSM is really an indication of an error in the Registry. For example when a road has been closed, and the government somehow missed that. This is more work, because the Road Registry contains a lot of little bits of “roads” that are really just driveways. Because nobody cares about them, they aren’t in OSM. But they are quite hard to filter out from the Registry data.

The results

The cleaned up dataset of roads that are in OSM and not in the Registry was really quite limited. Only 138 situations needed manual review. Of those cases, 32 were a simple matter of slightly different geometry. For example when OSM mapped the road as a polygon, which we didn’t really take into account. We identified 33 cases where the Road Registry was clearly wrong. Then there were 31 cases that looked like they shouldn’t have been in the selection anyway: they are private driveways, parking aisles, tramways. About half of those needed a fix in OSM. But the “tramways” were actually dedicated bus roads on top of tramways.

Most of the “mistakes” detected in OSM were caused by larger geometry issues. Sometimes the centerline of a road is debatable, but in most of these cases OSM could be improved, sometimes vastly. These were most often roads that hadn’t been touched in years. Only in a couple of cases was OSM really vastly wrong. This happened when the city reorganized streets, and somehow, nobody noticed. Most striking was the Troonplaats, which is a quite popular square. In several cases, OSM had already been corrected in the month or two between data download and final analysis (though to be honest, some of those were fixes of mine). A few mistakes were caused by errors in or outdated road classification.

There was one striking case (pictured above), where we were convinced OSM was wrong, but we apparently missed a big change in the road geometry. Fortunatly there was a [Mapillary sequence], of course one of the 1.1 million pictures uploaded by filipc. Even though the aerial photography in Flanders is excellent and recent, the only place this road shows up is on the OSM map.

As Stereo pointed out in the comments, OSM cannot be copied by a non-ODbL source. I always translated the license of OSM as “if you merge your private data with OSM data, you have to open up your data”. But that’s not correct, it should be: “if you merge your data with OSM data, you have to open up your data AND prohibit anyone from ever making it private again”. In this case, the Flemish government allows (and explicitly wants) TomTom and Google to take official data and use it to improve their private data.

Because of that, us government workers are not allowed to copy features from OSM. But there is a precedent: the New York City government uses OSM to track changes to their buildings as imported into OSM. I’ll trust their research that ODbL does not exclude using OSM to detect errors, if you then proceed to do your own surveying before making changes to your own dataset. This is also what the License Working Group believes, as Simon Poole (thanks!) pointed out in the comments. I understand this bit of text was supposed to have landed in the Legal FAQ page, so I went ahead and did that. Please revert if this is inappropriate.

The ODbL always made sense to me, and it kind of still does. Say I was to download all of OSM to my own server, and redistribute it under a more open license. Then someone else could just take that data and close it off. But this case does help me understand those who aren’t very happy about this license a bit more. In the case of government, it means you can’t -really- integrate OSM into your processes. For example, you couldn’t take OSM, validate it with your own data and redistribute the result under the license of your choice.

Have a look

You can have a look at the cases here. There’s a bit of work left on the cases with a difference in geometry. The easiest way to get the Road Registry into your editor is with this (slightly outdated) WMTS:{z}/{x}/{y}?access_token=pk.eyJ1Ijoiam9vc3RzY2hvdXBwZSIsImEiOiJjaWh2djF1c2owMmJrdDNtMWV2c2Rld3QwIn0.9zXJJWZ4rOcspyFIdEC3Rw

You can contact me to get the FME models we used to identify these roads - they aren’t very complicated. You could easily do similar things in open source software.

Location: De Kluis, Buizingen, Halle, Halle-Vilvoorde, Flemish Brabant, Flanders, 1501, Belgium

Comment from Stereo on 1 October 2016 at 13:41

That’s fantastic! Is the Wegenregister licensed under the ODbL? Does the central Flemish government know that it can only ever re-distribute it under the ODbL?

Comment from joost schouppe on 2 October 2016 at 06:55

Hi Stereo, This is a usecase I’ve seen several times before, and no-one in the OSM-community I talked about things before ever thought it controversial. But uhm, now I’m a bit worried. The Road Registry “is open data”, and that was always enough. It is licensed under the same Flemish Open Data Licence (of which I couldn’t find an official English translation) that has been found compatible with re-use in OpenStreetMap. But I’m not sure if the movement in the other direction was ever investigated.

The analysis above is more of an “experimental” thing, and we only used OSM to spot mistakes - we didn’t just copy OSM. But we’ll have to put it on hold, I suppose, until we can clarify these legal issues.

Comment from joost schouppe on 3 October 2016 at 07:50

I’ve edited the text to explain a bit on the legal stuff.

Comment from Stereo on 3 October 2016 at 15:22

I’m certain that you can’t launder ODbL data to become public domain data. If there were a way, it would certainly go against the spirit of OSM. The point of the share-alike clause, to many people, was to stop the Google and TomToms of the world from getting OSM data without sharing back.

I’m not sure how NYC understands the ODbL and gets errors reported. If the error reporting happens before the data layers are mixed, there is of course no problem. If there is a point where mappers can report mistakes in the data they have found without comparing the official data with OSM, it’s great. If there is a systematic automatic diff with OSM, and all output is investigated, it’s still a significant extract from the database.

There’s nothing that stops a government from maintaining two databases concurrently, or from only releasing ODbL data.

Comment from Stereo on 4 October 2016 at 22:59

Thank you for the clarification, Simon!

It sounds like we’ll get the best of both worlds in this case then - everyone can get the best available map, and OSM gets validation and resolution of differences between official and OSM data for free.

Login to leave a comment