
OSM & government, in Lithuania

Posted by joost schouppe on 6 March 2017 in English. Last updated on 7 March 2017.

When OpenStreetMap started, open geodata was basically unavailable. Some governments were quicker than others to release their data. And so some places had huge imports from the start. Whether that was a good idea or not is slowly becoming irrelevant: the map is too full for big new imports anyway. Imports are ever more exercises in conflation: merging sources and using them to validate and improve existing OSM data. The good news is that it means that often the same tools for the “initial” import can be used for keeping the data up to date. Continues synchronization between datasets changes the relation between data provider and OSM.

For a government, a complete and reliable OSM becomes a more valid tool for their projects. The synchronization processes we set up, can form the basis for an extra quality assurance (QA) channel for governments. It might even convince some agencies that there is little to be won by managing some of their data on their own.

To try and capture this changing relation, I started a thread on the talk mailing list. Mikel suggested creating a Wiki page on the subject: here it is. Meanwhile, several people have improved upon it!

During the course of the research for that page, I met Tomas Straupis. I wanted to share what he told me about what they do exactly with government data, and what their relationship is with the government.

Interview with Tomas Straupis

Here’s a general idea what we’re doing in Lithuania.

Government has datasets d1, d2… dn. OSM has one big dataset O which could be split into datasets o1, o2… om. We take datasets dx and oy which could be mapped (have similar data, like placenames, roads, lakes, rivers, etc.)

Automated importing to either direction is impossible (or not wanted by both sides). Government datasets need strict accountability (sources, documents) and responsibility. OSM has different data and simply overwriting it with government data would be bad in a lot of ways.

So the way integration between OSM and government (and actually any other datasets) is done is by synchronisation - checking for differences and taking action (mostly manual) on them on both datasets. By doing a comparison both government and OSM datasets are improved. The point here is that government datasets usually use official (document) source to update data. OSM uses local knowledge to update data. None of these methods are perfect, so synchronisation/comparison helps to get most/best of both. (as a separate note: here comes OSM strength that everything is in one layer - it is much harder to have a road going through a lake or building or having a street A with address B along it. Government datasets are usually separate and controlled by different institutions, so doing such topology checks is much more difficult there)

For this to work government must open datasets and appoint a working contact point where information about problems in government dataset could be sent and there this information is ACTUALLY used and feedback given.

Do you have more info on the projects, and the software/queries you use?

All info is in Lithuanian… Maybe google translate can help with the links to Lithuanian blog site I will provide below (if not - just tell me I will write the general idea in English).

All OSM data is imported to postgresql database using osm2pgsql and that is used for comparison/synchronisation.

We’re doing two types of comparison/synchronisation: 1. POI (point data, for some types of polygons centroid could be used) 2. Road (multi-vector data)

For POI synchronisation we have an ugly but functional universal comparison mechanism. We convert external data to xml file with lat, lon and some properties (or external source provides us information in xml for example via web-service). Then we provide mapping of this external data to OSM data. So having external data, mapping and OSM data we can create reports of differences.

Try automatic translating these two entries to get a general idea:

To compare road data, road shapes files are loaded to postgresql using shp2pgsql and then some queries are executed to find differences. Once again general idea is in this blog which you can try to translate:

So basically we use postgresql/postgis and php. If you have more specific questions - I’m ready to answer them or send the code, just it is a dirty code as I’m a google copy/paste “programmer”… :-)

Does the government use your input, and how? Is there something structural? Or just mailing them and hoping they care?

Lithuania is a small country, everybody knows everybody :) Now we occasionally drink beer with “government” guys working with gis data. So we know they do change the data. They also give us feedback which data sets are “more important” for them, so we can prioritise comparing those. This way both sides are happy and thankful for help.

Additionally each month we take new/updated government data and do new comparison, so we can see that data has actually been updated.

From more or less “legal” perspective. This central government agency for gis data allows submitting error reports online for registered users (registration is free and open to anybody - - created according to EU directive on spatial data). And they must check and give feedback in 20 days. We (OSM) are in somewhat different level - we mail directly to responsible group. One of the reasons for that is that they physically cannot fix all errors we report in 20 days, sometimes there are too many of problems, additionally they know report comes from a “trusted” source.

As per “structure”. For point type geometry (for example place names) we currently create a google doc online, where both sides write comments and status of errors. When everything is fixed - we take new updated government data and recreate that google doc.

For roads it is per-case mailing of coordinates and notes… But there is no reason why that could not be done in more “structural” way…

Maybe important point here is that OSM data could have some “bad/incorrect” data entered by mappers with not enough experience. And we do not want to make government gis people to sort/filter out such errors. So we go through all errors ourselves and only send those, which we think are really errors. This is the main reason why we cannot simply “automatically” run queries and send result to government people. There are no “technical/IT” problems to send mismatches automatically.

About amount of work

Initial comparisons of a specific dataset usually produces a large number of differences. Some of those are due to actual differences, some are because of different ways of entering data. So initial amount of work is usually high: both for updating data as well as fine-tuning comparison rules. After that only small amount of work is anticipated, because comparison simply notifies one side about the change in another sides data.

A note from Andrius Balčiūnas, Head of IT departament at GIS-Centras

Georeferenced data is created from ortophoto, but data changes much more often (than ortophotograpy is updated, currently each 4 years in Lithuania). OSM community notices the changes much faster. Therefore collaboration with OSM and their data usage for error checking, allows us to achieve higher data quality and relevancy. As this data is later used in national registries, cadastres, information systems - OSM community helps not only to improve the specific data set, but the whole national spacial data infrastructure content quality. Important thing to note here is that such a collaboration means that even small road segment or other improvement of OSM data by a community member could later appear in official government data.

A note on the ODbL license, and dealing with it. Government can use our error reports to start their own mapping process, but they can’t just copy our features. Do you know what they do at your government services?

Two points here:

  1. Government is not using/copying any features from OSM. They get reports about problems and this simply attracts their attention on specific features in their datasets. By using their own sources they fix the problem. It cannot be done in any other way, because all changes/all data in official dataset must have an approved/reliable source. OSM triggers the process, OSM does not give any data.

  2. Any database consists of numerous facts (features/records). Only the whole database can be protected by law. Single facts cannot be protected. If any database is publicly accessible, anybody can look at some facts (place name, street name, hotel name etc.) in that database. Then those facts become the facts they know/have in their brain. They can use it to update/insert such data in any other database irrespective of the permissions of original database. I’m not a lawyer. This is what I’ve heard from lawyers here in Lithuania. So in practice this means I can take this and that from ANY publicly accessible database (even google), until I do not take “too much” of the database that it is not just “some facts”, but “a considerable part of the database”. The big question here is only what is “considerable part of the database”…

P.S. 2nd point makes map “easter eggs” almost pointless…


Comment from Alan Bragg on 7 March 2017 at 23:23

I’d like to learn how to use tools to compare my little corner of the world with shape files I can get from my town GIS guy.

Comment from joost schouppe on 8 March 2017 at 07:44

Hi Alan,

Depends on the volume and what exactly you want to achieve. For exploration, QGIS is quite practical. You can just drag and drop shapefiles, you can add OSM tiles as a background and there’s all sorts of ways to add OSM data. Then you can do some spatial analysis to find (mis)matches. But even umap or mapcontrib might work, depending on your goals.

Log in to leave a comment