OpenStreetMap

marczoutendijk's Diary

Recent diary entries

Improving OSM - why don’t we? [15]

Posted by marczoutendijk on 15 August 2019 in English. Last updated on 31 October 2019.

Why are mappers using landuse=village_green in the wrong way?

More than once in the past few years have I started and participated in various topics on the tagginglist (as well as on the local Dutch forum) where the precise use of the landuse=village_green was discussed.
A Village Green is a situation that is described in the wiki, and I quote:

“… is a distinctive part of a village centre. It’s an area of common land, usually grass but often including flowers, shrubs, small trees and a pond, located in the centre of a village (quintessentially English - defined separately from ‘common land’ under the Commons Registration Act 1965 and the Commons Act 2006).”

The proposal (in 2006) and the final voting (two in favor and none against!) never reached a wide audience (which back in 2006 of course was much smaller than nowadays) so nobody outside the UK really knew or understood its meaning and specific use.

The following text is partly copied from my posting on the tagging list:

Because I found out that the tag is greatly misused, I did an extended research to get more details about its current use.
My research is based on the OSM dataset of 14 july 2019.

The total number of tags for landuse=village_green is: 91645
I then took a selection of 22 countries (see table at end) and compared the uses per country to its use in the UK, because that country seems to be the main reason for the existence of this tag.
In those 22 countries the tag is used 55721 times and there are 5569 unique mappers responsible for using it.
I was surprised to see that in the country where I live, the Netherlands, the tag was used 260% more than in the UK!
Given the original definition you could expect that in the Netherlands (based on the number of cities/towns/villages and assuming that each of those indeed had a Village Green - which isn’t true) there could be at most 2440 Village Greens, not the 5131 we have now. Where, then, are the 2691 others located??

And what about the other countries?
I started first by randomly (worldwide, with the help of overpass) looking at the map to see what people had marked with the tag, but later created a database application which allowed me to load faster the data of the map and inspect it.

My strategy was this:
For each of the 22 countries in my list, I sorted on changeset number to have the data in oldest-newest format. Interesting to see that its first use (12 years ago) wasn’t in the UK but in Germany, where the tag is anyway used more than in any other country. The most recent use was some days before I finished my research.
I took the two oldest uses, the two most recent uses and one in the middle, to create a set of 110 changesets for visual inspection of the tag on the map.
The result (based on my earlier look at its use) didn’t surprise me at all: 65% of the landuse=village_green tag is not used according the definition in the wiki!
Because I first couldn’t believe the result, I started again, but now taking only one country and visited 20 randomly changesets. That made things worse: sometimes (by being very liberal in my judgement of what a village green could be, even accepting a small area of grass somewehere around the village center) the misuse raised to 80%!

What can we conclude from this?

In the wiki talk-page I already announced this problem and suggested to adapt the wiki to allow for different uses, based on consensus reached per country. We do that already for Spain and Germany (as you can read in the wiki), although that use is more according its intended use.

What I see now is a competely different use for landuse=village_green.
The most frequent (ab)use now are all areas covered with grass (anywhere in a village or even in rural areas), the centers of roundabouts, along stretches of highways, and the kind of “green” that you see on the photos on the wiki talk-page.
This wrong use is understandable: the word “village” and the word “green” both lead - for those not being native English speakers nor reading the wiki nor knowing anything about the historical context - to using it for the situations I mentioned above.

There are of course more occurences of faulty tags (other than landuse=village_green) for a given situation, but not to the extent that we see with the landuse=village_green tag.

The number of Village Greens is bound to some upper limit, someday we have all of them in OSM, but then people will still use that tag (as they do now) because it fits their definition, neglecting the wiki.
The situation that we have now: mappers are using a key-value pair (landuse=village_green) for tagging landuse that is not supposed to be tagged that way in at least 65% of the cases I investigated.
In the future that number will rise to the point where almost all use of landuse=village_green is wrong (or not according what wiki describes).

Interested in its use around you?
I have created an overpass that you can run in your own area.
It finds all landuse=village_green that is within 25 meters of a junction=roundabout.
The example that is included in this overpass query, is typical in its misuse of this tag.
Please be aware that it is time consuming for large areas.
The overpass is here: (move to your area of interest and hit the execute button.
http://overpass-turbo.eu/s/KSx


Table of use.

Country Use
Argentina 114
Japan 241
Greece 265
Chili 544
Russia 667
Italy 1008
USA 1438
Great Britain 1960
Belgium 2614
Spain 2856
Brazil 2940
Netherlands 5131
France 6867
Poland 9790
Germany 15855

Does this situation need our attention?
And if so, how do we deal with it? Given the very few reactions to my latest posting on the tagging list, my conclusion is that nobody really cares. Do you?

What shall we have for diner tonight?

Posted by marczoutendijk on 19 February 2017 in English. Last updated on 22 February 2017.

What shall we have for diner tonight?

### Improving the OSM map - why don’t we (14)

Some thoughts on restaurant and food-tagging on OSM.

A restaurant is considered an amenity and tagged with amenity=restaurant.
One would expect that in order to show what type of restaurant this is, or what food you can eat there, the next step would be:
restaurant=italian
restaurant=fish
restaurant=burger
After all, this is accepted:
natural=water
water=lake
But, alas, OSM is differently and so a new tagging was introduced to indicate what we can eat in a restaurant. No, they didn’t choose: food=* , but came up with:
cuisine=*
So, the correct tagging for a restaurant and what is served inside is:
amenity=restaurant
cuisine=italian
This is not so bad at all, because this scheme allows you to tag many more places where you can eat, but which are not considered a restaurant, like a cafe, bar or pub (or a railway station or book shop).
There are some curious constructions however, because to tag a Burger King (or any other fast food restaurant) you can do so in two ways:
amenity=restaurant
cuisine=burger
or:
amenity=fast_food
cuisine=burger
By itself, using fast_food as a value for an amenity is rather strange, because to me, fast food is a type of food, belonging to cuisine, not an amenity! (Would you use highway=asphalt? No, of course not, because highway=** expects a function of the highway it describes, not its surface).
The addition of the cuisine=
in the last case is maybe not even necessary, as hamburgers are core business in any fast food restaurant.
Over the years the list of values to assign to the cuisine key has grown (and will keep to do so) and now (february 2017) we have two basic groups in the wiki:

  • 40 values for the type of food (like fish, meat, pizza, burger, kebab, soup, etc.)
  • 53 values for the ethnicity of the food (like italian, greek, chinese, mexican, etc.)

As values from both lists can be combined, this introduces a rich array of possibilities, but also adds confusion. For some people “eating Italian” just means having a pizza ordered, to others it is soup and pasta or a 5 course dinner in a restaurant.
I did some research on the different ways people have used the above tagging system to map restaurants and what you can eat. After all, it is likely that you can eat a variety of food in a restaurant, and that, in turn, requires multiple values to be assigned to a single key.
> (note: there have been many discussions on the tagging list as well as numerous postings on the forums on the best way to add and handle multiple values for a single key. Most seen is that different values are separated by semi-colons as can be read in the wiki, but some people think you shouldn’t use multiple values )

Suppose that we allow 4 different values (out of 40) to be used for the type of food (like burger, chicken, donut and kebab), that would give us a maximum of 2 193 360 different combinations. Of course not all combinations make sense, I don’t expect fish-pancake-noodle-casserole to be a frequent combination.
Choosing from the 53 ethnicity values would even give much more possibilities, but, again, not all are to be expected.
I found (among many others) the following combinations (from both lists) in use:

  • chicken;kebab;fish_and_chips
  • pancake;friture;chicken;grill;breakfast;coffee_shop;beef_bowl;russian;fish_and_chips
  • burger;sandwich;pasta;pizza;ice_cream;chicken;coffee_shop
  • burger;sandwich;breakfast;sausage;local;noodles;pasta;pizza;chicken;diner;
  • burger;sandwich;local;chicken;fish_and_chips
  • bagel;breakfast;cake;coffee_shop
  • chicken;burger;grill;oriental;breakfast;pizza;hotdog;kebab;local
  • italian;indian;regional;mexican
  • fataya,sandwich_poulet,hamburgers
  • italian;creative
  • kebab;pizza;schnitzel;sausage;salad
  • indian;vegetarian;chinese
  • italian;pizza

I also found:

  • 짜장면,짬뽕,탕수육,간짜장,우동,짬뽕밥,잡채밥,육개장,잡탕,양장피_등등_중국요리
  • 早餐:蛋餅,蔥抓餅,饅頭,肉包,炒麵,豆漿,紅茶
  • 돈까스,_피자,_스파게티
  • Горячие_блюда,_гарниры,_закуски,_салаты,_кампот

In the above list I have marked in bold type those choices that are not in any of the 40 food (or 53 ethnicity) wiki values (excluding the entries in non-Western script). In the current taginfo database there are 21878 occurrences of the cuisine=* tag. The one used most is cuisine=regional that is used 62291 times. But there are also 17849 occurrences of that key which appear only once, but every time with a different combination of values like I showed you above.
The last multiple value in the list above is italian;pizza which has been used 948 times. What exactly does it mean? Pizza is Italian so why bothering adding that also? A simple cuisine=pizza would suffice, or does it mean that you can eat all and every Italian food in a restaurant tagged in this way, but maybe with pizza as something special? I don’t know.

Usually, when a key=value pair occurs only once, it is considered likely to be a typing error (like cuisine=piZza or highway=terziarie) or a new value made up by the mapper (like cuisine=romanesc), but the small sample (taken from the full list of 17849 unique cuisine=* occurrences) above, are not typing errors, but taken from all the possible and valid combinations. How many such combinations are possible? Assume that we allow 2 choices from the ethnicity values and 4 from the food values, then we have a maximum of 53 x 52 x 40 x 39 x 38 x 37 = 6 044 900 160 possible values for the cuisine=* tag! (Yes that is: six-billion fourtyfour-million ninehundred-thousand and onehundred and sixty)

Which way to go?

I have seen proposals of adding the complete merchandise of certain shops to the OSM database. By doing so we would be able to query OSM for “the nearest shop where I can buy an ironing board”
To me that makes no sense at all, as there is no way of getting all that data reliable into OSM. And maintaining it would be an ever bigger challenge.

Should we try to do the same with restaurants and food?
Given the rather careless manner in which the multiple valued tags for cuisine have been used (a result from the database design we are using which allows for any combination of keys and values - in any language - without any error checking at all), I don’t see any usability soon for applications - based on what is in the OSM database - that can compete with what already is on the market for customers. Have you ever spoke to anyone who tried to find out where he/she would go for dinner tonight - including selecting what to eat - by using OSM?
One - fairly big - problem is that roughly one-half of the restaurants has no cuisine tag at all, making it useless for what you were trying to find out (“what can I eat?”).

I know that we can put anything in the OSM database, but we cannot put everything in it.
Let us focus on getting data (as much as we can) into OSM that turns it into a great map (that includes showing where I can find a restaurant), but shall we avoid creating a mediocre restaurant and food guide?

Clean up the "fixme's" around you!

Posted by marczoutendijk on 30 November 2016 in English. Last updated on 15 February 2019.

The fixme=* tag is often used to give other mappers an indication that something needs more research (or it is a “note to self”) .
All too often it stays at that point and no one ever cares any more about such a request for improvement. I found out that roughly more than half of the fixme’s is at least 2 years old.

I wrote a simple overpass query with some stylesheets attached which shows the text of the fixme immediately on the map.
Click on the link above, locate the map to your neighbourhood and hit the run button.
Is there something you can fix? Please do so and remove the “fixme”

For the centre of London, this is the result:

The overpass query searches for nodes with a fixme=, but you can easily change it to finding ways instead.
And if you want to locate all note= tagging, simply replace “fixme” with “note” in the script.

A year ago we started a program were new mappers - after they did their first edit in The Netherlands - received a welcome message with links to various sources of information on the mapping process, the do’s and don’t’s, the editors and other useful stuff.
I was the initiator of that program and also the one responsible for finding the new mappers ([Pascal Neis provided the necessary RSS feed) and sending the messages. As such it was a one-man job.

After one year and sending more than 1500 individual messages to those new mappers, I will no longer continue with this program.

Why?

  1. In my earlier diary-entry on this program some (statistical) conclusions were drawn about the number of mappers and the amount of mapping activity over time. There seems to be no change to this statistical data since that report. But I did not expect to happen that either. Research from others points in the same direction as can be learned from the reactions to that first article.

  2. About 75 persons (0.4%) replied to my welcome-message, mostly with a simple “thank you”, sometimes asking for more information. A very small number of people joined (and stayed in) the active mapper group, but most of the new mappers are “one-time-only” mappers. [1]◊

  3. After maps.me [2]◊ became available as a simpe data editor for OSM, a great number of people entered the OSM mappers world [3]◊, but most of them are not aware of the underlying principles and goals of OSM, nor are they aware of the communicating chanels we have (mailing list and forum). Hence, sending a message to those mappers is rather useless because they are not aware of the fact that there is such a thing as a private mail-box in their account. This became the more problematic as a lot of tourists are now acting as “mappers”, but given their limited knowledge of mapping and the limited possibilities of maps.me, those edits very often need the hand of an experienced mapper to fix. [4]◊

During the past year that I have run this program, I learned a lot about the way new mappers behave and what they expect from OSM, and it is my opinion that my welcome message (including links to a number of wikipages in Dutch) did only help in a minority of cases. And in those cases where the help realy was needed (maps.me), I couldn’t reach the mappers…

There are maybe other/better ways needed to instruct new mappers on how to do the job, and this might be a topic for future research by others.


◊◊
[1] A “one-time-only” mapper might of course return to mapping, years later, as history shows. But it does not happen very often.
[2] Maps.me is a great app that I use as my on-the-road tool to consult the OSM map and as a routing aid. I would never use it as a serious mapping tool.
[3] An increase of about 20%
[4] Other OSM communities experienced more or less the same problems with maps.me, as can be read here, here and here.

Improving the OSM map - why don't we? (13)

Posted by marczoutendijk on 18 May 2016 in English. Last updated on 13 February 2019.

Improving the OSM map - why don’t we? (13)

### Why so many people are not using OSM. Do you recognize the renderer that was used for the above screenshot? I’m pretty sure you can’t. Because it wasn’t rendered but printed in the Times Comprehensive Atlas of the World.
Looking at this map, it is clear (at least to people who are familiar with “paper” maps [1]) what we see:
* A number of Islands that have a name as a group (Canary Islands) that are part of mainland Spain
* Each Island has its own name (printed in italics or bold italics)
* Each Island has a capital (printed in bold, but this is not true for all the Canary Islands)
* A number of towns is printed in normal type

Now let’s see how OSM based maps and renderers show this to the world. The same Islands, with three different ways of rendering (Humanitarian, Mapquest and Mapnik).
The most striking omission (to me) is, that none of the islands is shown with their name and Mapnik shows every Island as “España”.
Even when zooming in, the names of the Islands never (and I mean NEVER) show up (I’m supposed to see Tenerife now somewhere on the map):

Now, what kind of a map is that?? Not showing what is most important!?!
Is there a road (top picture) running from Santa Cruz de Tenerife to España?
When I use a (printed) map or atlas, I can see at the large scale maps (1:500.000) what I need to find my way. At that scale I’m not interested if there is a paved/unpaved way ahead of me. And even less I do care about the traffic signs that I might see once I’m there.

I know that everything I’m looking for (and much, much more) is in the OSM database, but why is it shown to me at the wrong moments (if at all) and at the wrong zoom levels?
BTW, Google maps is not doing much better than OSM, showing (some) Island names at high zoom levels.
Do I use OSM myself? Yes!! All the time, and because I have learned to ignore all the crap it is giving me (like showing me the map in Chinese when viewing China, even if English is the language I have installed as my basic language), and because I know the strength it has with the right tools, to me it is the perfect map.
But to a lot of people who are used to a regular printed map or to Google, OSM is just a funny experiment that you can’t even use decently on a mobile phone.
Of course, there are tools and apps that use the OSM data in a much more user-friendly way (especially on mobile devices), but why can’t openstreetmap.org be a bit more user friendly?

One more example of the incompleteness of OSM.
In the part of the map I’m showing you here, we are supposed to see the capitals of the UK (London), France (Paris), Belgium (Brussels), Luxemburg (Luxemburg) and Holland (Amsterdam). Can you spot them?


Even worse, at this zoomlevel the only capitals shown on the map are London, Dublin and Budapest!!
Madrid, Paris, Brussels, Amsterdam, Luxemburg, Rome, Vienna, Berlin, Oslo, Kopenhagen, Stockholm? What?? Where?? Are they gone??

Friends I’m trying to move over to use OSM (by pointing them to my own openpoimap, I often tell, they call it a “slippy-map”, but you better consider it a “shitty-map”.

I hope that the developers who are working on the way the basic map is being presented to the users, read this and try to create a map that users recognize instead of being puzzled.
I have said it before, at this moment OSM is a map for mappers, not for users.

[1] Of course there are people who have never seen or used a printed map before, and for those OSM is maybe a great tool, but I doubt it.

Statistical data of the Dutch OSM mappers.

Posted by marczoutendijk on 5 February 2016 in English. Last updated on 23 April 2017.

Trying to improve the commitment of new mappers and to help them overcome the obvious beginners problems when trying to map, the Dutch community (after discussion in the user-forum) started to welcome new mappers as soon as they had made their first edit (in the Netherlands) on the map. To find out who the new mappers were, I used this rss-feed, provided by Pascal Neis.
This welcome program started on the 1st of August of 2015 and continues to this day. It is run by me and as such is a one-man task.
During this process I became curious to the mapping behaviour of the mappers and started to collect some data about their activity:

  • when did they start their user account?
  • when did they start to map?
  • how many edits did they do?
  • and much more

Soon I realized that I needed more data (over a longer time span) to get a better insight and so I contacted Pascal Neis and asked him to provide me with the relevant data, dating from some years back. After some startup problems with the data - not all the mappers seemed to be present in the data - I started my research with a dataset that contained the following data:

  • userID
  • username
  • date of registration
  • date of first edit in the Netherlands
  • date of their latest edit
  • number of changesets

First results

The dataset I have used for my research contained 3205 mappers that have done a first edit in the Netherlands between 1-1-2014 and 29-1-2016.

On first inspection of the data, it surprised me to see that some mappers did their first edit 7 years after they had created an account! This, then, was the first thing to investigate: how many days (after registration) pass before the first changeset is created?
Next I investigated how many days passed before the mapper did his latest (and very often his last) edit.

We see that most mappers (77%) create an account and start to map immediately, but 4% of the mappers waited more than 3 years before they did a first edit. But it is striking to see that for almost all of those mappers (68%) this first edit is also their last! So called “hit-and-run” mappers.
“Last edit” is of course hard to tell, because they might return some day in the future and do another edit, but experience so far doesn’t prove that.
Of course it is difficult to draw conclusions based on a rather small dataset, but it nevertheless looks not to far from truth to conclude that OSM mapping is basically in the hands of a small group of dedicated persons.

When are you a regular mapper?

If I look at my own status, I have got the label: “a crazy mapper”, whatever that means, but once every three days (on average) I’m mapping: adding new things, fixing errors, searching for errors etc. But even if you add/change/correct things once every three months, you’re a regular mapper.
The number of days since your last edit is a good measure of your status. See the next table:

This table shows that 138 mappers (4%) did edit something but did not return for a period of more than 730 days (2 years) after this edit. This is the maximum my dataset can reveal (because it spans 2 years and one month), and it is possible that some of those mappers will return in the future, but it is not very likely.
1411 mappers (44%) did their latest edit more than 1 year ago and still another 25% of the mappers did not return to mapping within (at least) 6 months.
One might say that for the majority of the mappers it is a one-time-only affair. Probably fixing something in their own area (missing names, shops, houses etc) and then never return.

A good measure of your mapping activity is the number of changesets you have done, and that is what is in the next table.
(Showing # changesets, # mappers in numbers and as %, sum of group left to it.)

This table shows:
1225 mappers did create 1 changeset
10 mappers did create (each!) more than 1000 changesets

And 82% of the mappers created between 1-9 changesets. From the graph it is obvious that this is almost a perfect example of an exponential curve.

Conclusions?

In the Netherlands we have an active (albeit small) community of mappers and there is no indication that this community is different (statistically) from the complete set (2 000 000+) of OSM mappers (see links below), but it is also clear that the results that we get from the different datasets are not always easy to understand and only after at least one more year we might get some results that show us if the welcome program that we run in the Netherlands has improved the participation of the Dutch mappers!


some useful links:
osm-report-2015
osm-activity-2014
osm-activity-report-2013
openstreetmap-active-users

Improving the OSM map - why don't we? [12]

Posted by marczoutendijk on 24 December 2015 in English. Last updated on 23 April 2017.

What or who is our source?

When we map, we use something (or someone) to base our mapping on. Preferably in such a way that other mappers can verify what we added to the map.
To help with that task, we are requested to add a source=* tag to whatever we put on the map.
What are the most frequent used sources on OSM?
The top 3 (all used more than 10 000 000 times):
1. BAG - 18 840 435
2. cadastre-dgi-fr - 12 150 135
3. Bing - 10 695 411
You can see the full list on taginfo yourself.

The BAG is a large import of all buildings in the Netherlands. wiki is here.
In France the same was done. wiki is here.
Do I have to explain Bing?


The tag source=* can be used in numerous combined tags (like source:date) as can be seen here.
For this article I only used the plain source=* tag.
However, I was not interested in the source=* tags that were most used, but on the ones used less. And even more so in source=* tags that were incomprehensible. For instance, what should we think of this: Am I supposed to call that telephone number if I want to check the details?
I found quite a few instances of such source-tagging where a telephone number was used. And quite often this was used on bars and clubs. Some of them “members only”.


A telephone number is at least something that could be used as a source of information, although it is not very informational. But what about this one? What source of information is hidden in 0,4? This one is also not very helpful: And sometimes mappers have some sense of humor, like: “Q” (think James Bond) is of course a wealthy source of information: And this one surely used a source, but left us just guessing: And this one? It is clearly a mistake, because the mapper use a colour code in the source field. Finally, the weirdest I saw: I contacted the mapper to ask what it meant, and it must have been some mistake. He maybe corrected it already at this time. To see to the situation it refers to, check this.

The list is endless, sometimes hilarious, but should we do something about it? I don’t think so, relatively those errors are a minority, because they are used very seldom. Mostly just once, hence they don’t take up very much space.
But using a reliable, understandable source key, would surely help improving OSM! Think about your source the next time you let other mappers now where your data came from!

All screenshots with openpoimap.

Improving the OSM Map - why don't we? [11]

Posted by marczoutendijk on 24 September 2015 in English. Last updated on 23 April 2017.

Do we all speak the same language? Should we?

What is a hamlet?
The wiki is clear:
an isolated settlement, typically with less than 100-200 inhabitants, although this may vary by country
What then, could have been the intention of the mapper who did this: It’s a city full of hamlet’s!
I think that this is a clear sign of the limitations of the use of just one language (English) as the “Lingua Franca” for OSM. Too many people don’t understand or speak it, and come up with solutions in their language that they think appropriate. Leaving others guessing about their meaning - how serious is that?

From what I could research (with the help of our great universal translator), it looks like people who are relatives (or are native to that spot) are living together in the houses marked as hamlet, but I’m not really sure of that.
If it is true, is that something that needs to be on the map? E.g. should we use the OSM database as a simple (or complicated) address book? Of course we can, but should we?

One thing is clear: from the definition in the wiki, the above use of hamlet is not correct. Do you have a better idea?

To be continued…

Improving the OSM map - why don't we? [10]

Posted by marczoutendijk on 13 August 2015 in English. Last updated on 23 April 2017.

Fix the fixme/FIXME/FixMe/Fixme/Fixme: !!

During my research of the taginfo database, more than once I was staring flabbergasted to the proliferation of keys.
In normal use a single key with its value would give enough information to describe the object you just created. E.g. the shop=* key. The value for that key comes from the suggested values in the wiki.
But what if one shop sells things that belong in various categories from the wiki? Like this one: You cannot add 5 shop keys to one node (which is ok and according to the rules of good database design; any key should exist only once for a given object). Suppose you could, which one should the renderer show? All 5? On the same spot on the map? You must be kidding!
So what now? Should you add a new node for every shop-type, including all the address information? Or should you just tag it with shop=food and possible list all the different values in a note? For this particular case the choice of the mapper wasn’t all that wrong, although I would have preferred a different solution.


### On now to the fixme-chaos! Why are so many people coming up with a new variation on the very well defined fixme key ?
Below you see an overview of the most of those variations.
The number of times used is also in the list. The first (and most used) 2 are:
fixme
FIXME
I do have a question about the second version:
WHY DO WE NEED AN UPPERCASE VERSION?????
Is the FIXME more urgent to fix than the fixme?
Do we have name and NAME, shop and SHOP, landuse and LANDUSE? (For LANDUSE the answer is yes but used only 2498 times, and Landuse is used 31 times. For NAME the answer is alo yes, 393 times).
But do we NEED those uppercase versions?
I don’t care that they are being used probably because of typing errors, I do care about a system which enforces consistent and reliable data entry.

Almost always it is possible to just have one single key and explain in the value of that key what needs to be fixed.
Why:
fixme:admin_level=7,8
and not:
fixme=admin_level should be 7,8
If you argue that it is more clear in this way (at first sight) what needs to be fixed, then I start a proposal that we should start using:
shop:bakery=yes instead of shop=bakery
amenity:bench=yes instead of amenity=bench
landuse:grass=yes instead of landuse=grass
Maybe even better is shop:bakery:yes=true and then use shop:bakery:yes=false for the Butcher?
Got the picture?


One reason I could think of why there exist so many variations is this: Why did the mapper use 4 times FIXME:X (he should have used fixme:x in the first place) and not just put all the city names - that apparently are missing from the map - in one single fixme=longlistofcitynames?
The reason is that no keyvalue can be longer than 255 characters, a stupid limitation that in current database design is absolutely not necessary.
Read the comments to my diary entry and read this.

Why are so many variations a problem?

Given the fact that OSM is a mondial database, it is logical that we need a greater number of tags to describe shops worldwide than we need to describe shops just in the UK. Cultural and social differences are reflected in the database and we should allow for it and respect it.
But so many keys are being used so little that we really need to think about ways to get a more consistent database.
Searching for something is probably the most useful task we keep database for. Tools (like OpenPoiMap rely on consistent data. If you want to know all amenity=atm in a given area, do you expect you have to search for amenity:atm=yes* as well?
With OpenPoiMap you can easily search for fixme or FIXME because this tag is easily available in its list of choices. Do you have some spare time? Then try to fix what what needs to be fixed in London! But for the other 300+ variations of the fixme key, you’re on your own! See my list above.

Conclusion.
Roughly 1300 keys are being used more than 10.000 times and they cover probably 99% of the data in the OSM database. That leaves us with about 53.000 keys with values that no one knows about and probably no one cares either. Do you?

Improving the OSM map - why don't we? [9]

Posted by marczoutendijk on 5 August 2015 in English. Last updated on 23 April 2017.

Because it is difficult?

In my previous diary entry I showed you some of the problems that I see with the amount of keys in use for the tagging of objects in OSM (54382 at the time of my research: 25 july 2015).
In a reaction I got from user Hedaja he pointed to an interesting blog I wasn’t aware of, by the maintainer of the Taginfo database, Jochen Topf.
Jochen - in this blog - also mentions the “one-time-only” use of keys and calls for action in an attempt to lower the number of keys back to a healthy 40.000.

I did some research and I downloaded the taginfo database on 25 july 2015.
It has a table “keys” with 54382 keys that I used for the next statistics.

  • 19037 keys appear just once - which is 35% of all the keys;
  • 27731 keys appear at most 3 times - 50%; (note: this includes the keys above!)
  • 35453 keys appear at most 10 times - 65%; (including keys above!)

I consider keys used 10 or less times suspect of some mistake in the use of the key (e.g. wrong spelling of a regular key).

Lets consider any key that is used 10.000 or more times a “trusted” key. How many are there?

  • 1292 keys appear 10.000 or more times - 2.4%

In between we have a group of 17.516 keys that are used between 11 and 9999 times.

By itself all those numbers do not mean very much because what counts more is what value the key has. A key that is used once can only have one value. E.g. the key “nitrox” is a one-time-only key and it can be found here.
A key that is used twice can have at most 2 different vallues and a key that is used 100 times can have at most 100 different values. The key that is used most on OSM is the key: source, it appears 162.428.193 times with 143.491 different values (one of them is Bing and another is bing).

Now, then, how can we use all this information to get rid of all those keys that shouldn’t be there because the mapper added them by accident or by ignorance?
Sometimes a mapper adds a concluding space at the end of a key, simply by hitting the spacebar instead of the return key. You don’t see anything on your screen of it, but it gets recorded in the database: We see that this happened only twice with the name key, but the same error happens much more often. I heard that at regular times a bot is running to fix all those invalid spaces, but I’m not sure.
And if you are one of the mappers that created those keys above and happen to read this also, please fix it!
Do you want to know the values of the correctly spelled name key?
Here is the first page (of more than a million) of taginfo about that key:

Now, lets look at a “rare” key. What about: gauge:1879-1934?
Here it is (screenshot with openpoimap): It’s about the trackwidth of this railway track between 1879 and 1934.
According to the wiki the gauge=* tag is supposed to have the trackwidth like gauge=1435. But because there are no instructions on how to handle the situation where the trackwidth is changed after some time, the mapper choose to add that time-span to the key. Is it wrong? I’m not sure, but it is definitely a key that is not easy to re-use. How many other tracks changed their gauge in the same period? (1879-1934).
And what happened between 1906 and 1934?? Did they use both trackwidths?
On the other hand, why include historical data in OSM? We have other OSM datasets that are meant to collect historical data. OSM is supposed to “map what is on the ground”, but a railway from more than 100 years ago, is it still there?

There are many more examples to be found that are questionable, but removing all those tags and replacing them with more “valid” ones is not an easy task and needs to be done with care.
If you want to see more examples yourself, the best way to do that is to go to taginfo and select the page with all the keys. Currently it contains 3218 pages. Click on the second column (Objects) so that it is sorted low to high and then scroll a few pages to see the keys that have a count of 1. Take your pick and see the results in taginfo. Please leave your comments or recommendations here.
I have one more question: what about the keys in the database (121 by number) that do not appear at all?

Improving the OSM map - why don't we? [8]

Posted by marczoutendijk on 1 August 2015 in English. Last updated on 23 April 2017.

Where do we leave our Garbage?

Taginfo is a great tool to see where and how a given key is used on the map. It also gives you some nicely formatted tables with statistical data of all the tags (a tag is a key=value pair).
Did you know that the most used key is source=*?
On 25 july 2015 it appeared 162.428.193 times with a total of 143.491 different values. You can find the most common tags here.

But I was more interested in keys that appear just once in the database, because I expected many of those “solo” keys to be erroneous. To find that out, I downloaded the taginfo database (there is a link on the taginfo page to do precisely that). Be warned: after downloading and expanding that database, you have a 5 Gb file in Sqlite format to handle! But doing so, I could do my research with more details and faster than using the tools on the taginfo site. I opened it in my Sqlite client and after 10 minutes: I had 74.569.089 records on my computer to research!
For every record in the database you have numerous tables with information available, one of them gives the information of all the keys that are in the database:
and this helped me to find what I was looking for: the count_all field.
Below is this “keys” table with the first 20 entries: The count_all field is the one I needed and after the necessary code I produced a table with all the keys that appeared once (had a value of count_all=1).
Here is the beginning of that table, after I sorted it (well, my computer was so kind to do it for me!). Rather weird names for a key, don’t you think? What would “+++” denote? Or “129/”?
The first entry (source:name) starts with a space character! If you enter that string (or any of the others) into the search field of taginfo, you can get all the details about that key: its value, how many times used (1) and where it is used if you click on the tab “map”. You can also click on the overpass link to see its precise location.
So, for that first entry I did all that and it turned out to be this: somewhere down-under in Australia. Try it for yourself!

Keys in a database are not supposed to start with a space character, but the OSM database accepts anything and does not do any check on what you enter, save for the length (max 255). Also keys are supposed to contain alphabetic characters and may contain digits as well. Some special characters are allowed also, like e.g: “_” and “-“.
But a key with just numbers? What is that? Let’s see for 09200: It seems to be the postal code for a village in France. But then it should have been:
addr:postcode=09200
addr:city=Montégut-en-Couserans

A lot of the (faulty) keys I found are of the uppercase/lowercase type:
Name when name was meant for instance. Almost any regular key (amenity, shop, tourism, highway, landuse etc) appears in a misspelled version in the database (tourims, land-use etc). Also added interpunction (name; or name, or name-) counts for quiet a number of those one-time-only keys.
All in all 19.037 keys appear only one time in the database and out of a total of 54.382 keys, that is more than a third!

Not all of those keys are “wrong”, but too many of them are, and will never be used again.
Is that a problem? Not really I think. It does not consume very much of disk/memory space, certainly not if we compare that to the huge amount of data that is also in the database. But sometimes it leads to unexpected results with software that consults the database.
So, the answer to “Where do we leave our Garbage” is: just where it is.
But if you ever come accross such a situation, please correct it and remove what is not necessary or redundant.

Improving the OSM map - why don't we? [7]

Posted by marczoutendijk on 28 July 2015 in English. Last updated on 23 April 2017.

How do we deal with multiple values for a key?

We all know this situation: you need to add a telephone number to a node and add the line:
phone=00311198765432
Then you find out that there is a second phone number for that node, but you can’t add a second phone= tag because OSM doesn’t allow that.
The general question is: how do I tag multiple values for one key?
Let’s investigate how mappers have solved that problem sofar. The screenshots all are made with OpenPoiMap.


[1]
This example is the Eiffel tower in Paris for which four architects worked together, but only one gave his name to the final product!
In the source the names are separated with semicolons:
Stephen Sauvestre;Gustave Eiffel;Maurice Koechlin;Émile Nouguier


[2] One piece of art created by 5 artists (somewhere in Seattle).
Create a new key with a sequential number attached to it for every member of this group. In this specific case I would have started numbering with artist_number_2 because the first one is already in artist_name. Even better, I would have started with artist_name_1 and used it for Andrew Keating and would have omitted artist_name altogether.


[3] This example is to show how to map multiple sets of related tags to one node. In this case we have man_made=mast which has attached to it 4 (mobile phone) antennas at different heigth and each working with a different technology.
An underscore could have been used as in the previous example, but there seems to be a tendency to tag situations like this with a key:N notation, where N is running over the natural numbers.


[4] Here we see a combination of both methods. Adding a number (with underscore) at the end of the key to count them or adding the number after the colon. Again, in the case of the fuel I would have started with fuel:diesel:1 for the Biosolar.

Any different opinions on this subject?

Improving the OSM map - why don't we? [6]

Posted by marczoutendijk on 24 July 2015 in English. Last updated on 11 August 2017.

Redundant or weird tagging?

Sometimes we have to tag a shop without knowing what kind of shop it is. Then we use: shop=yes.
If you know the shop is a clothes shop, then shop=clothes would suffice. Using amenity=shop as in the example below is not encouraged: My advice is to clean-up such tags whenever you encounter them.


The next examples left me puzzled:
Is there really a jewelry in that bar?
What does amenity=printer mean? Can you buy a taxi in that shop? Does it come with the driver?

I used openpoimap for all the examples with this code:
amenity][shop
which translates in: “find all nodes\ways\relations\ that have both an amenity key and a shop key, irrespective of value of that key”.
Try it out in your own area!

Improving the OSM map - Why don't we? [5]

Posted by marczoutendijk on 22 May 2015 in English. Last updated on 23 April 2017.

How to use Notes?

Whenever we tag something there are a lot of key=value pairs we can choose from. One of them is the note key. With openpoimap I was investigating the use of this tag. Below you see a screenshot of Berlin with the note’s that were made on a great number of nodes.

IMHO the use of a note is twofold:

  1. The note is used to clarify something that cannot be displayed with any other key=value. E.g. This note tells me what I need to know about Fritz Schloss.
  2. The note is used as a reminder to the mapper himself (or to others) that one is not sure about what is displayed with the other keys. E.g. some more research is needed. This use of a note is more or less the same as the fixme/FIXME key (which for mysterious reasons is available both in lower- and uppercase). Here is an example:

But now have a look at this: The note is exactly the same as the name tag for that node. Why is that? Probably the mapper first wrote down that note as a reminder and later decided to use the note text as the value for the name, but why not remove the note? It is completely without any use!

So, mappers, whenever you make a note to yourself, remove it once the case is clear!

Improving the OSM map - Why don't we? [4]

Posted by marczoutendijk on 22 May 2015 in English. Last updated on 23 April 2017.

Bits and Pieces

  1. In this screenshot we see redundant information. Either use building=entrance (although that is now deprecated), or use entrance=yes (or main). Don’t use both.

  2. What is this mapper trying to tell us? What should be fixed? Here we see the situation: I don’t know what to fix!

  3. The number of levels of the building is an estimate. But what about the colour? You cannot reliable describe the colour of a building with RGB colours! Every colour that is painted or printed is CMYK.

Improving the OSM map - Why don't we? [3]

Posted by marczoutendijk on 12 March 2015 in English. Last updated on 23 April 2017.

How useful is a tag that exists only once?

The OSM database is simple: A key and a value for every node you want to store into that database. And because it is a liberal database with no checking at all, you can put anything you like, in it.
Of course we have some rules and guidelines:
Map Features
From that we can learn that amenity=hospital is the preferred way to mark (you guessed it) a hospital on the map. Which is used 122164 times on the map (taginfo)
I think most map users have a clear idea on what exactly is meant by that specific tag combination.
But what about this one:
According to the wiki, man_made is:
A tag for identifying man-made (artificial) structures added to the landscape.
Below you see what this man_made=1417-32 looks like: Can you see what it is?? I don’t!
Looking it up in the wiki for that specific value gives (you guessed it) an empty page. Why does one create a specific value for a key, without explaining to others what it means??


On taginfo you can find 1132 entries (75 screen pages) for man_made with a uniqe (used only once) value. This is the last page of taginfo on the man_made key: And the last entry is “junk”.
I looked that one up again: Of course (you guessed it) there is no wiki page on man_made=junk.
In terms of data storage, those 1132 entries don’t take much room, but why use them in the first place? Could the mapper - at least - not have taken the steps (setting up a wiki on that value) to help other mappers with this obscure value?

So please, mappers. If you create a value for a key that is not yet used anywhere else in the database (and think thrice before doing so), be so kind to the other mappers and explain what you are doing and why you are doing it! And even better: discuss this first on the appropriate forum or talk-list.

Improving the OSM map - Why don't we? [2]

Posted by marczoutendijk on 11 March 2015 in English. Last updated on 23 April 2017.

Do we tag what something is not, has not, or what?

Part 2 in a series of comments on the current mapping problems and curiosities that I encounter with OSM.
Look at this Bicycle Repair Station: Can I safely assume that:

  • service:bicycle:truing_stand=yes?
  • service:bicycle:freewheel_removers=yes?
  • service:bicycle:dishing_tool=yes?
  • service:bicycle:headset_cup_remover=yes?
  • service:bicycle:cartridge_bottom_bracket_tool=yes?
  • service:bicycle:simple_headset_press=yes?
  • service:bicycle:metric_taps=yes?

at the University of San Francisco Bicycle Repair Station?
Any decent Bicycle Repair Station without a chain tool should not be allowed to carry that name!
Imagine this:
leisure=swimming_pool
swimming_pool:water=no
Or this:
tourism=hotel
hotel:number_of_beds=0

So please, when you tag, think first!


Another one: You need five minutes to find out what stuff cannot be recycled at this particular recycling station.
If we keep tagging in this way, we need 30 times the amount of data storage, compared to what is needed if you tag consistently:

Tag what something is, or what is available or visible

Like this one: