Because it is difficult?
In my previous diary entry I showed you some of the problems that I see with the amount of keys in use for the tagging of objects in OSM (54382 at the time of my research: 25 july 2015).
In a reaction I got from user Hedaja he pointed to an interesting blog I wasn’t aware of, by the maintainer of the Taginfo database, Jochen Topf.
Jochen - in this blog - also mentions the “one-time-only” use of keys and calls for action in an attempt to lower the number of keys back to a healthy 40.000.
I did some research and I downloaded the taginfo database on 25 july 2015.
It has a table “keys” with 54382 keys that I used for the next statistics.
- 19037 keys appear just once - which is 35% of all the keys;
- 27731 keys appear at most 3 times - 50%; (note: this includes the keys above!)
- 35453 keys appear at most 10 times - 65%; (including keys above!)
I consider keys used 10 or less times suspect of some mistake in the use of the key (e.g. wrong spelling of a regular key).
Lets consider any key that is used 10.000 or more times a “trusted” key. How many are there?
- 1292 keys appear 10.000 or more times - 2.4%
In between we have a group of 17.516 keys that are used between 11 and 9999 times.
By itself all those numbers do not mean very much because what counts more is what value the key has. A key that is used once can only have one value. E.g. the key “nitrox” is a one-time-only key and it can be found here.
A key that is used twice can have at most 2 different vallues and a key that is used 100 times can have at most 100 different values. The key that is used most on OSM is the key: source, it appears 162.428.193 times with 143.491 different values (one of them is Bing and another is bing).
Now, then, how can we use all this information to get rid of all those keys that shouldn’t be there because the mapper added them by accident or by ignorance?
Sometimes a mapper adds a concluding space at the end of a key, simply by hitting the spacebar instead of the return key. You don’t see anything on your screen of it, but it gets recorded in the database: We see that this happened only twice with the name key, but the same error happens much more often. I heard that at regular times a bot is running to fix all those invalid spaces, but I’m not sure.
And if you are one of the mappers that created those keys above and happen to read this also, please fix it!
Do you want to know the values of the correctly spelled name key?
Here is the first page (of more than a million) of taginfo about that key:
Now, lets look at a “rare” key. What about: gauge:1879-1934?
Here it is (screenshot with openpoimap): It’s about the trackwidth of this railway track between 1879 and 1934.
According to the wiki the gauge=* tag is supposed to have the trackwidth like gauge=1435. But because there are no instructions on how to handle the situation where the trackwidth is changed after some time, the mapper choose to add that time-span to the key. Is it wrong? I’m not sure, but it is definitely a key that is not easy to re-use. How many other tracks changed their gauge in the same period? (1879-1934).
And what happened between 1906 and 1934?? Did they use both trackwidths?
On the other hand, why include historical data in OSM? We have other OSM datasets that are meant to collect historical data. OSM is supposed to “map what is on the ground”, but a railway from more than 100 years ago, is it still there?
There are many more examples to be found that are questionable, but removing all those tags and replacing them with more “valid” ones is not an easy task and needs to be done with care.
If you want to see more examples yourself, the best way to do that is to go to taginfo and select the page with all the keys. Currently it contains 3218 pages. Click on the second column (Objects) so that it is sorted low to high and then scroll a few pages to see the keys that have a count of 1. Take your pick and see the results in taginfo. Please leave your comments or recommendations here.
I have one more question: what about the keys in the database (121 by number) that do not appear at all?