Ongoing voting on Education 2.0 proposal have demonstrated, that some OSM members don’t understand the situation with multiple properties. I was a kind of surprised by this lack of understanding, since it is already explained in Wiki, so I want to explain it one more time.
- Single value stands for
key=valuecase, for example:
- Value list stands for semicolon-separated list of values, for example:
- Separate keys stand for multiple binary keys for each property, for example:
Case (1) is the most simple and traditional in OSM. However, if certain feature uses this tagging style, its value becomes exclusive, which means, there can be only one value. And for certain features it’s completely acceptable, because, for example, man-made structure can’t be a mast and a tower in the same time.
Case (2) was, probably, the first way to find a workaround for non-exclusive values. It’s a straightforward way for those cases, when properties are non-exclusive, which means, certain object can have several properties of one feature. Just like in example above, a pitch could be used for playing basketball and volleyball.
Case (3) is, at first glance, similar to (2). And some people think that it’s only a different tagging style. Indeed, it allows to represent the same situation, but with separate keys (tags) instead of value list in one tag, so semantically it doesn’t have any advantage. It even looks more complicated, because its structure actually consists of three elements, not two:
namespace:subkey=boolean_value, where namespace equals to key in case (2), subkey equals to each value from list in case (2) and boolean_value is just
no (in case of
no, the whole tag is omitted).
But the thing is, OSM is a database. Which means, those tags we using are actually data structures. And data structures should be usable. There is a whole applied science on that topic - data architecture. So, if you’ve never heard about it or you don’t know what exactly it stands for, please, read an article about it in Wikipedia (link is above).
From the point of view of general public, there is no semantic advantage of method (3) over method (2). But there is huge advantage from the point of view of data architecture. That is why method (2) is strongly discouraged for the most cases, except only a few of them, as it is more or less clearly described in Semi-colon value separator article. In simple words, semicolon separator is usually acceptable for those cases, where value list contains a list of strings (portions of free text), used for labels or descriptions, not properties. In more technical words, where these strings from lists are not used for querying objects for any purpose, including subsetting data, applying rendering style, etc. It includes complex text structures such as lane tagging or opening hours, meant for deeper parsing by design.
Why value lists are so bad for querying? It’s simple, it’s all about performance and having known predictable data structures. Having something like
sport=basketball;volleyball from case (2), to work with it, software needs to break it down first and to store resulting list somewhere in memory. Before it starts doing that, it should first determine, how many elements are in list. While in case of (1) and (3), number of operations is smaller and there is nothing unpredictable, since
namespace:subkey always breaks into known number of strings (even if subkey is compound, like
subkey:sub-subkey - usually we don’t have to care about it, it can be processed as a single string) with predictable order.
There is another technological reason: OSM uses XML-style “attribute=value” constructs, therefore, it’s quite logical to be able to use tools, intended for working with XML, such as query languages, frameworks and so on. And usually, these tools are not intended for dealing with values comprised by lists (by technical reasons explained above). While methods (1) and (3) are perfectly compatible with general XML ecosystem.
Anyone can easily get some experience and compare methods (2) and (3) by writing an example of Overpass API (significant part of OSM ecosystem) query for both cases, given in definitions section above:
sport:volleyball=yes. Imagine that you need to select all pitches with both these properties. I can guarantee you, that method (2) will require utilizing CPU-hungry regular expressions, which will inevitably reduce query performance.
Voters on Education 2.0 proposal have expressed certain objections to method (3) I want to address directly.
… semicolon-separated lists are much more concise and therefore easier.
Easier for what? For reading - maybe (if such list is very short), but not for any technical purpose, including tagging preset development, editor interface development and so on.
… it’s just an aversion to semicolon-separated values. I understand that this is a proposal by Russians, and some Russians such as XXzme have expressed their aversion to semicolon-separated values. I respect their opinion, but I have the opposite opinion, sorry.
No, it’s logic, not an aversion. Read everything above. And it’s not “another opinion”. Opinion is a view, not necessarily based on facts or logic, however, logic behind preferable use of method (3) is explained both in Semi-colon value separator article and in this diary entry. If someone finds it false, feel free to point on it. Otherwise, you have an aversion, based on personal preference, which doesn’t have to be respected, since it contradicts reasonable requirements to tagging schemes.
… you cannot model the whole world in a key:*=yes ontology, …
That’s actually classic demagogy - to say, that your opponent said something false (which he didn’t) and to prove it’s false to discredit him. Nobody claimed that it’s possible and/or necessary to use method (3) to model the whole word (it’s another piece of demagogy - incorrect universal quantification). If someone thinks, that collision of exclusive properties will never occur for certain feature, it’s okay to point on it and explain, why method (1) is acceptable. But using demagogy usually reveals lack of real argumentation.
… that ridicules the key=value scheme in OSM
How exactly? Again, method (3) is actually recommended by OSM documentation in Semi-colon value separator article. And it is technologically much closer to method (1) than method (2) is.
Tagging method with separate subkeys and Boolean values is an effective part of tagging system. It doesn’t replace other methods completely and nobody claims that it does. For cases, where any chance of collision between exclusive values exists, it’s the only effective method. Value list method is equivalent to it only semantically, but it’s inferior to it in technical aspects of data architecture, performance, usability. Usability for mappers strongly depends on tools: complex schemes are rarely used in fully manual manner - plugins, presets and other tools helping people to use it, therefore, it doesn’t really matter, how exactly these tags look.
And the last thing. Being a part of OSM project for several years, I’ve heard many references to national features, like “in this country, we do it this way”. But I’ve never seen anyone making general references to any nation in negative connotation, like it happened in one of comments to opposing vote on Education 2.0 proposal. Indeed, there are several members of Russian community, including myself and Xxzme (notoriously famous for his Wiki activity, but who has no connection to this proposal) mentioned in that comment, who trying to promote separate keys method for cases, where collision of properties can occur and who from time to time criticizing legacy tagging methods.
But shouldn’t judgment be based on what is said and its logic instead of who said that or other personal features? I really hope, opposite thing will never happen again.