joost schouppe's Diary

An idea for making it easier to link external data to OSM

Posted by joost schouppe on 4 February 2015 in English.

I know a lot of people have a problem with OSM objects not having a dependable unique identifier. Of course, a node has an ID which will never change. But a campsite mapped as a node will get a very different ID when someone decides to re-map it as a polygon. This makes life complicated for external applications who would like to link up their data to OSM. For example, a fabulous application like iOVerlander (collects data, reviews and ratings on wild/formal campsites) might want to make all the campsites available in OSM rateable in their application. But it would be silly to also copy the geography to their database - as OSM geography is improved upon all the time. Of course, [there’s a fuzzy way to refer to a specific object] (http://wiki.openstreetmap.org/wiki/Overpass_API/Permanent_ID), but that’s really of no use in this case. Imagine a campsite without a name. Then you could tell OSM to look for a campsite within a certain radius of where you found it. But what if a new campsite has been added? What if the campsite has gotten a better coordinate? What if it has become a caravan site. Etc… Or a more complex case: take a bar that has moved locations. Do you give preference to the location or to a bar with the same name somewhere else in town.

This would be an argument to just include much more data within OSM, as that way the link between the thing and its description cannot easily be broken. But considereng even adding some price information is controversial, adding opinions etc. would be unthinkable.

As I’ve been playing with the idea of using Openstreetmap as a base for an open alternative to Tripadvisor, I’ve been thinking about this problem a lot. In a flash of inspiration, I thought of this concept. I would like to hear some opinions about that. Anyone who has a project that requires a thing to have a unique ID can look it up through a query to an www.osmdata.org . All objects that have linked external content, get an extra tag, for example “osmdata=uniqueid01”.

Here’s how it could work in practice. Imagine a site where all things vaguely related to tourism are searchable and clickable on the map. Take restaurants as an example. Or generate a list of all restaurants in a city. This list can be updated automatically all the time. But once users start adding untaggable information, like “overpriced” or “what a lovely atmosphere”, this data will be saved outside of OSM. Instead of forking the location, the restaurant gets an extra tag in OSM (osmdata=uniqueid22), and the bits of external data saved outside of OSM get this same ID. Now when someone moves the restaurant in OSM (copying tags or dragging the node and deleting the old node) nothing gets messed up. When someone re-maps the restaurant as tags on a building, they copy the osmdata tag too, and again nothing is broken. If a different project wants to use the same thing, they just use the same osmdata unique id. That way, database bloat is minimal.

Another example would be to rate subjective features of roads, like how scenic are they. The same principle could applied; and the result could be Michelin-style maps with a green outline for crowd-approved beautiful trips.

Of course, a side-effect will be that external projects like iOverlander would have a much easier time building their project around OSM data. Which would mean that their users would contribute to OSM, instead of just to the external project.

I’m very interested to hear your ideas on how this problem could be solved - or how it is not a problem - or how it has been solved before

Discussion

Comment from Alan Trick on 4 February 2015 at 21:53

At the risk of derailing this important issue, I would like to mention that I think this problem is a lot more pertinent to OSM than current coffee prices.

To some degree, this sort of linking has already be done with a lot of imported data. For example, the tiger:tlid tag. Of course these tags are specific to their imports. I see two problems with this approach:

First, it requires users to copy the tags when recreating objects. A lazy or ignorant user probably won’t. I think expecting users to do this is going to create a lot of headaches when some invariably don’t.
Second, this requires the mapper to decide when the identity of an object changes. For, example, if there is a McDonalds, and it moves into a building across the road, is it still the same McDonalds? What if the building is on the other side of town? What if two campsites are merged, who’s id is used then? On OMS, a road often gets broken into segments because there is a bridge, or the speed limit changes, do these segments get the same id? In real life, there are often unconnected roads with the same name that were historically the same road, do these get the same id?

One fuzzy solution that will probably work reasonably well for your use case is to match campsites based on the location of the campsite (or an average of the nodes in its polygons). If that value changes significantly, you probably want to to consider it a separate camp site anyway.

Comment from Sanderd17 on 4 February 2015 at 22:24

Next to users forgetting to copy the tags, there are also users that will copy the tags when they see creating a new object. Some newbies might think this set of tags is necessary, they adapt or remove the ones they understand (name, address, phone, …), but it’d likely they don’t know about the id. Then you end up with two objects with the same id.

The question if an object is moved, or deleted and recreated is indeed a valid question. And IMO, it completely depends on the data user. Some data user might want a new entry when the operator changes (because the bills have to go to a different address, the service might be different, …), others depend on the name, or the location. It’s obvious that it will be impossible to include data like that in a sane way.

Maybe you know the permanent id feature of overpass? http://wiki.openstreetmap.org/wiki/Overpass_API/Permanent_ID It’s versatile, as it needs to fit all data users, but that also makes the interface less user friendly. If you want to use it for something custom, the are possibilities to give it a better interface.

When you really want to change osm, I would do it at the meta-data level, not at the tags users can see. It would need an update to the api version, and many editors will need to be altered, but it’s possible. Just like you currently have version information in the data, you could have info from which historical object the data comes. The editor then tries to see which object replaces which, or which object is derived from which, and adds it to the metadata. Typical operations include splitting and merging of ways, and replacing a node with a polygon (which is harder to discover). Being able to see that the history of one object depends on a different object would also make vandalism detection and reverting easier.

Comment from Zethradon on 5 February 2015 at 00:06

We most definitively need a way to link OSM data to external databases, and I really like the idea you propose. You have my vote!

OpenStreetMap is entirely user-driven data, and it makes sense to have user-driven ids to link with external databases as well.

Comment from aseerel4c26 on 5 February 2015 at 19:23

in case you did not know it: we have wikidata tags in our data, which use such a unique ID like you propose, if I read correctly. See also https://wiki.openstreetmap.org/wiki/Wikidata

Comment from MartinDiazAlvarez on 6 February 2015 at 20:58

I am new here. And I’,m also newbie to mapping. Thanks for all the useful information guys.

Comment from joost schouppe on 6 February 2015 at 23:00

Very interesting thoughts, thank you all very much. I didn’t know about the wikidata tag. What makes this example so interesting, is that they could in fact refer to an object in OSM by the wikidata tag. Instead they refer to the relation ID. Here’s an overpass example for a random thing with a wikidata tag: http://www.overpassturbo.eu/s/7wY

I wonder why they didn’t do that. Yes, there are problems with my proposal, but much bigger problems with using an OSM id.

Comment from nan_a on 7 February 2015 at 15:33

I really like this idea. But yeah, as aseerel4c26 mentioned, there is the Wikidata identifier that could be used.

We should be aware that more than one Wikidata ID is used for different objects, though. For instance, I’ve seen different statues depicting the same person being linked to that person’s Wikipedia page. That’s something to look out for.

Of couse the problem is that there are more OSM nodes than Wikidata entries, so relying exclusively on Wikidata would be insufficient for our needs.

Comment from joost schouppe on 7 February 2015 at 17:54

Nandachuva, there is no necessity for there to be a one on one relationship between openstreetmap objects and external objects. In your example, it sounds like a reasonable query to ask OSM to show all statues related to one person from this person’s wiki page. But what if there’s a wiki page for the person and for the statue. Than the statue might need two values for the wikidata tag (u-oh). Or we would have the statue refer to the statue page, and create a relationship for the person (u-oh), containing the statues for this person. This relationship would then have the wikidata id for the person.

The second problem you mention looks more complicated to me. I suppose wikidata will only allow wikidata id creation for things that are ‘notable’. I believe all things should -potentially- have an external id. I could theoretically just tagging things for my theoretical project, say introduce a openrestaurantid=Q123 , for my own selection of things that deserve an idea. But what I would really like a s a kind of API that allows any external project to ask to generate an external id to be written into OSM; and preferably all using the same tag. Maybe though, it would be better to do something like this with something like externalid:opentrip=Q123 and externalid:wikidata=Q111 , for a restaurant that has both reviews and a wikipedia page.

Comment from jremillard on 8 February 2015 at 02:50

External consumers should store the id’s of the OSM objects and process the changeset files to keep the ids up to date, tracking when the move, deleted, added, turned into area, area turning into a relation, etc.

Comment from Ben Abelshausen on 8 February 2015 at 08:32

Why is there always so much distrust in the average mapper? I think it is posible to implement a system like this and to help and educate newcomers.

It is also very easy to quality check once you know what an ID actually represents. A bank will stay a bank and a campsite will not change into a McDonald’s. Also the problem of mappers copying IDs to new objects is very easy to detect and respond to. An automated system could easily detect duplicated IDs that are new.

Comment from Polyglot on 15 February 2015 at 19:29

Euhm, osmdata.org doesn’t seem to exist!

Concerning wikidata, there is also:

subject:wikidata that’s who/what the statue or picture depicts architect:wikidata name:etymology:wikidata for streets or objects named after the wikidata item operator:wikidata=* brand:wikidata=* artist:wikidata=*

So it’s rather versatile.

There is the issue of whether wikidata wants entries for, for example camp sites, but they want to include a lot more than just items noteworthy enough to deserve a wikipedia page, so normally that shouldn’t be a problem.

It was indeed a bad move to refer from wikidata to OSM relations for named areas. I’ve been telling them that from the start. The only effect it had is that they didn’t create more such properties, I believe.

What they also need on their side, is a way to use the OSM wikidata tag in a Wikipedia page, for example. Overpass is the ideal glue here. Unfortunately they deleted my attempts to create such links:

https://nl.wikipedia.org/w/index.php?title=Pater_Damiaan&diff=40641959&oldid=40640633

They kept this one: https://nl.wikipedia.org/wiki/Guido_Gezelle#Tastbare_gedenktekens

Anyway, I think wikidata can get us a long way towards the goal for permanent ids. For the items which really don’t fit in their DB, we might still need other types of foreign keys. I’m curious how they would react if I’d start importing 50000 bus stops into wikidata, let alone those of the whole world, but I didn’t ask, so maybe it’d be just fine. For the time being I’ll keep using ref:OPERATOR=xyzxyz.

Also don’t underestimate OSM contributors, we’re generally quite an intelligent lot, able to decide whether something deserves to keep the same identifier or not. OTOH, it’s also through that those external DBs will need to run Overpass queries to check whether all their identifiers are still present in OSM and if they should be on exactly 1 object only, they’ll have to come over and repair occasional damage. At least they’ll have a practical way of doing that.

What would be nice is editor support for the wikidata tag, though. Showing the identifier in the user’s language and fall back if it isn’t, instead of the Q-number.

Polyglot

Comment from Jan van Bekkum on 17 February 2015 at 08:06

A camping that is referred to in for example iOverlander would need an OSM relation anyhow to store information about amenities available at the campsite. See here.

The relation ID could be used for external reference. I would expect that it is more stable than the object ID. For example if two campings merge they would keep the relation ID of one of them (which reflects the real situation).

Comment from Geonick on 18 March 2015 at 00:00

See also “Permanente/stabile OSM IDs!” on Talk-de https://lists.openstreetmap.org/pipermail/talk-de/2012-July/097009.html

Comment from joost schouppe on 20 March 2015 at 22:43

Geonick, I’m guessing you’re Stefan by the picture. After the comments here and further discussion, I think your third point is exactly what I proposed to the iOverlander team. Three years after you came up with the idea :)

joost schouppe's Diary

An idea for making it easier to link external data to OSM

Discussion

Log in to leave a comment