What shall we have for diner tonight?

Posted by marczoutendijk on 19 February 2017 in English (English)

What shall we have for diner tonight?

Improving the OSM map - why don’t we (14)

Some thoughts on restaurant and food-tagging on OSM.

A restaurant is considered an amenity and tagged with amenity=restaurant.
One would expect that in order to show what type of restaurant this is, or what food you can eat there, the next step would be:
After all, this is accepted:
But, alas, OSM is differently and so a new tagging was introduced to indicate what we can eat in a restaurant. No, they didn’t choose: food=* , but came up with:
So, the correct tagging for a restaurant and what is served inside is:
This is not so bad at all, because this scheme allows you to tag many more places where you can eat, but which are not considered a restaurant, like a cafe, bar or pub (or a railway station or book shop).
There are some curious constructions however, because to tag a Burger King (or any other fast food restaurant) you can do so in two ways:
By itself, using fast_food as a value for an amenity is rather strange, because to me, fast food is a type of food, belonging to cuisine, not an amenity! (Would you use highway=asphalt? No, of course not, because highway=* expects a function of the highway it describes, not its surface).
The addition of the cuisine=* in the last case is maybe not even necessary, as hamburgers are core business in any fast food restaurant.
Over the years the list of values to assign to the cuisine key has grown (and will keep to do so) and now (february 2017) we have two basic groups in the wiki:

  • 40 values for the type of food (like fish, meat, pizza, burger, kebab, soup, etc.)
  • 53 values for the ethnicity of the food (like italian, greek, chinese, mexican, etc.)

As values from both lists can be combined, this introduces a rich array of possibilities, but also adds confusion. For some people “eating Italian” just means having a pizza ordered, to others it is soup and pasta or a 5 course dinner in a restaurant.
I did some research on the different ways people have used the above tagging system to map restaurants and what you can eat. After all, it is likely that you can eat a variety of food in a restaurant, and that, in turn, requires multiple values to be assigned to a single key.

(note: there have been many discussions on the tagging list as well as numerous postings on the forums on the best way to add and handle multiple values for a single key. Most seen is that different values are separated by semi-colons as can be read in the wiki, but some people think you shouldn’t use multiple values )

Suppose that we allow 4 different values (out of 40) to be used for the type of food (like burger, chicken, donut and kebab), that would give us a maximum of 2 193 360 different combinations. Of course not all combinations make sense, I don’t expect fish-pancake-noodle-casserole to be a frequent combination.
Choosing from the 53 ethnicity values would even give much more possibilities, but, again, not all are to be expected.
I found (among many others) the following combinations (from both lists) in use:

  • chicken;kebab;fish_and_chips
  • pancake;friture;chicken;grill;breakfast;coffee_shop;beef_bowl;russian;fish_and_chips
  • burger;sandwich;pasta;pizza;ice_cream;chicken;coffee_shop
  • burger;sandwich;breakfast;sausage;local;noodles;pasta;pizza;chicken;diner;
  • burger;sandwich;local;chicken;fish_and_chips
  • bagel;breakfast;cake;coffee_shop
  • chicken;burger;grill;oriental;breakfast;pizza;hotdog;kebab;local
  • italian;indian;regional;mexican
  • fataya,sandwich_poulet,hamburgers
  • italian;creative
  • kebab;pizza;schnitzel;sausage;salad
  • indian;vegetarian;chinese
  • italian;pizza

I also found:

  • 짜장면,짬뽕,탕수육,간짜장,우동,짬뽕밥,잡채밥,육개장,잡탕,양장피등등중국요리
  • 早餐:蛋餅,蔥抓餅,饅頭,肉包,炒麵,豆漿,紅茶
  • 돈까스,피자,스파게티
  • Горячиеблюда,гарниры,закуски,салаты,_кампот

In the above list I have marked in bold type those choices that are not in any of the 40 food (or 53 ethnicity) wiki values (excluding the entries in non-Western script). In the current taginfo database there are 21878 occurrences of the cuisine=* tag. The one used most is cuisine=regional that is used 62291 times. But there are also 17849 occurrences of that key which appear only once, but every time with a different combination of values like I showed you above.
The last multiple value in the list above is italian;pizza which has been used 948 times. What exactly does it mean? Pizza is Italian so why bothering adding that also? A simple cuisine=pizza would suffice, or does it mean that you can eat all and every Italian food in a restaurant tagged in this way, but maybe with pizza as something special? I don’t know.

Usually, when a key=value pair occurs only once, it is considered likely to be a typing error (like cuisine=piZza or highway=terziarie) or a new value made up by the mapper (like cuisine=romanesc), but the small sample (taken from the full list of 17849 unique cuisine=* occurrences) above, are not typing errors, but taken from all the possible and valid combinations. How many such combinations are possible? Assume that we allow 2 choices from the ethnicity values and 4 from the food values, then we have a maximum of 53 x 52 x 40 x 39 x 38 x 37 = 6 044 900 160 possible values for the cuisine=* tag! (Yes that is: six-billion fourtyfour-million ninehundred-thousand and onehundred and sixty)

Which way to go?

I have seen proposals of adding the complete merchandise of certain shops to the OSM database. By doing so we would be able to query OSM for “the nearest shop where I can buy an ironing board”
To me that makes no sense at all, as there is no way of getting all that data reliable into OSM. And maintaining it would be an ever bigger challenge.

Should we try to do the same with restaurants and food?
Given the rather careless manner in which the multiple valued tags for cuisine have been used (a result from the database design we are using which allows for any combination of keys and values - in any language - without any error checking at all), I don’t see any usability soon for applications - based on what is in the OSM database - that can compete with what already is on the market for customers. Have you ever spoke to anyone who tried to find out where he/she would go for dinner tonight - including selecting what to eat - by using OSM?
One - fairly big - problem is that roughly one-half of the restaurants has no cuisine tag at all, making it useless for what you were trying to find out (“what can I eat?”).

I know that we can put anything in the OSM database, but we cannot put everything in it.
Let us focus on getting data (as much as we can) into OSM that turns it into a great map (that includes showing where I can find a restaurant), but shall we avoid creating a mediocre restaurant and food guide?

Comment from maxerickson on 19 February 2017 at 14:49

In the US we have many fast food restaurants that do not serve burgers; tacos, chicken, sandwiches and various Asian fusion places are some big categories.

As to the broader point, if the cuisine tags in an area are incomplete or too messy to use, ignoring them is just as effective as not storing them.

Comment from EdLoach on 19 February 2017 at 15:25

You could think of it along the lines of "Where can I eat?" and if the cuisine tag is provided that is additional information (along with say phone number and delivery hours for some of the local takeaways).

Comment from BushmanK on 19 February 2017 at 15:34

This is the same mess originating from natural language terms as the situation with shops. This issue of exclusive values also applies. Arguments against any improvement of it are old as mammoth feces: "it is too complex", "nobody will use it" (pure demagoguery, actually).

Regarding of too verbose descriptions of food amenities or shops, it shouldn't be an argument against changing anything, it should be a question, what level of verbosity is acceptable. Obviously, quoting a full menu is a nonsense. But it doesn't mean that nothing could and should be described - that would be a false dichotomy like "all or nothing".

Question about supportability of data is always valid - if there are not enough active mappers in an area, verbose tagging of something that could easily change in time has much higher chance of becoming outdated. In an opposite situation, it makes perfect sense.

Comment from dan980 on 19 February 2017 at 17:30

In Italy we have many restaurant that do no not serve pizza at all; on the other hand, many restaurants serve pizza-only, dozens of different flavours. The tagging scheme cuisine=italian;pizza makes sense: you can filter out the restaurants that serve both.

Comment from butrus_butrus on 19 February 2017 at 20:50


"fast_food" stand for a "fast-food-restaurant" and is therefore obviously a type of an amenity. It's just abbreviated.

Comment from dikkeknodel on 20 February 2017 at 03:25

For me a term fast food tells something about the time that it would cost to get the food. The you will off course get into the discussion where the distinction between fast and slow (or not so fast?) lies exactly. And off course that is an arbitrary number of minutes, up to the person adding the tag in the first place.

Comment from BushmanK on 20 February 2017 at 04:59

The problem here is that there are as many opinions what exactly "fast food amenity" is, as OSM contributors. Having that kind of discussion in the Russian community, for example, I've got variants such as: "these are certain brands", "these are American restaurants", "these are places with junk food" (whatever it means) and other kinds nonsense.

There can not be any definition that will suit a majority of cases, period.

Comment from Warin61 on 20 February 2017 at 08:29

Humm yes ...what shall I have for dinner tonight? (getting hungry)

Firstly the values entered can be anything as you have found .. that is the nature of OSM. The values on the OSMwiki are simply suggestions that try to keep some order and provide some guidance on the thinking of how things 'should' be done. I am afraid we are stuck with that .. and once things get big they have too much momentum that changing them for something that actually makes sense is very difficult. (thinking here of landuse=grass to landcover=grass)

The semicolon delimited values ... I'd like to see them is alphanumeric order as a preference. The alternatives are vast, but alphanumeric can be automated... if not within OSM then certainly by data consumers.

If you want to get in first before things get large you need to look at the start of things ... I have had a go at produce ... please see what you think and have a go at if yourself. By no mean perfect here!

Comment from BushmanK on 20 February 2017 at 17:38

@Warin61, there is at least one massive case of getting rid of a nonsense scheme - wood=evergreen and so on. The situation with grass doesn't deserve to be changed since there are multiple unclear views on the same tags.

Comment from BushmanK on 20 February 2017 at 17:39

Sorry, I meant to say wood=coniferous" andleaf_cycle=evergreen`.

Comment from Warin61 on 20 February 2017 at 20:54

@BushmanK "The situation with grass doesn't deserve to be changed since there are multiple unclear views on the same tags."

??? I'd think that would be a very good reason to clarify exactly what it is. In this case it is just "grass". And "grass" is a landcover. What it gets used for is a different tag and if needed that should be tagged ... but landuse=grass makes no sense which is why it needs to be changed and why it gets used for lots of things. To me the case of "wood=coniferous" andleaf_cycle=evergreen" is less clear.

Comment from BushmanK on 20 February 2017 at 21:41

@Warin61, I mean that there are multiple existing views on what landuse=grass and natural=grassland mean. So, people want to have their favorite tags to represent what they want (like, "managed lawn"). It is logical, that there should be an "atomic" tag to indicate wooded vegetation without any implications, but since people prefer to have what they like to what is logical, I don't see any sense in trying to introduce this kind of tag due to public resistance.

wood=coniferous wasn't anyone's favorite. But it has an obvious disadvantage - you never know, if it is used as an indication of conifers or as an indication of evergreen plants. leaf_cycle=evergreen together with others solves this problem perfectly. Resistance was very subtle because nobody wanted to keep old tags that much.

So, that's all about what people like and want. If they want some bullshit - this situation doesn't deserve wasting your time on it.

Comment from Carnildo on 21 February 2017 at 21:51

The addition of the cuisine=* in the last case is maybe not even necessary, as hamburgers are core business in any fast food restaurant.

You should probably tell Taco Bell, Subway, Panda Express, and any number of pizza joints that they're doing it wrong. You might consider "fast food" to be a type of food; I (and probably most other people) consider it a type of food service.

Comment from GRUBERND on 24 February 2017 at 21:54

for me the difference between restaurant and fast_food is what happens after you order your food:

restaurant: eat, then pay. fast_food: pay, then eat.

Comment from mcld on 25 February 2017 at 15:30

there are also 17849 occurrences of that key which appear only once, but every time with a different combination of values

This is not a problem. Data users should know how to handle semicolons for multiple entries, IF they are processing OSM tags for which the practice is common - as it is for cuisine. There's no need to worry about the "illusory" explosion of single-occurrence tags.

Others have already pointed out that other things you mention, like "cuisine=italian;pizza", are not problems. I'm getting the strong impression from this discussion that it's all fine!

By the way, yes I do indeed sometimes use OSM (and no other service) to find somewhere to eat, and I do look at the cuisine=* tag as well as the diet:vegetarian=* tag. I don't claim this is normal ;)

Comment from BushmanK on 25 February 2017 at 15:59

Unfortunately, "fine" is not necessarily enough. If there is a better data structure that doesn't have well-known disadvantages of a semicolon-delimited list, it should be used. A semicolon-delimited list is a very blunt thing, that's why it is not recommended to use it.

Comment from mcld on 25 February 2017 at 16:46

The word "blunt" doesn't say much to me here I'm afraid. There is no consensus against semicolon-delimited lists (after much debate...), and Marc has illustrated how widespread they are in cuisine=*. I think it unlikely that anyone's going to get the semicolons out of the map, and - although I won't rehearse the arguments (many have blogged on the matter, on various sides!) - in my opinion that's basically fine for the cuisine tag here.

Comment from GRUBERND on 25 February 2017 at 16:54

funny. me thinks a semicolon separated list is a beautiful thing.

easy to parse and elegant to store. and since the end user has nothing to do with handling data from the database, well, it's all flowers and sunshine everywhere.

and those that do data-handling, well, if they can't cope with a semicolon list format, well, they should not touch that data at all. seriously.

Comment from BushmanK on 25 February 2017 at 17:13

By "blunt", I mean that it is the first (and unfortunately, the last) thing anyone could think of when trying to invent a structure for storing multiple values of the same property. It looks simple at first glance, so people just stop thinking about any other options once they've got this idea.

Consensus on any question is impossible without a clear and specific goal definition. But when a goal is defined, a consensus is not required since logic invalidates everything that doesn't fit that goal.

For example, it seems like your conclusion is based on a view, that it is okay for data consumers to use as much preprocessing as required by data structures (like if these structures have some value to preserve it). This view is very common among those OSM members who are programmers themselves. My view on this aspect is different - I'm not assuming that any data user should have any real programming skills. There are tools providing a higher abstraction level of working with OSM data. These tools can, in the most cases, be used without any true programming skill - these are just querying instruments. But semicolon-delimited structures and any structures that must be parsed before use are more or less incompatible with these tools (or require a preprocessing to use them). And since OSM project has only one product - open spatial data, my idea is that we have to think about the usability of it and avoid adding extra barriers for regular data consumers (as an opposite to large companies or geeks who enjoy writing a code).

Comment from BushmanK on 25 February 2017 at 17:17

@GRUBERND, you sound like a programmer supremacist. But it's an open project, nobody can prevent you from being one.

Comment from mcld on 25 February 2017 at 17:27

Let's not get rude.

In my original comment, when I said "Data users should know how to handle semicolons" I should rather have said "Tools for data users should know how to handle semicolons". Just as those tools should handle the contortions of the opening_hours format etc!

Comment from BushmanK on 25 February 2017 at 17:44

That brings up another question:

What is easier: to find someone who will sacrifice personal time just to "teach" all these numerous tools to understand semicolon-delimited lists or to finally agree together on using single-value tags like namespace:value=yes since it doesn't require rewriting any tools?

Since OSM is a volunteering project, the latter should be, in theory, significantly easier. Only in theory, because there are so many people who put their views in front of practical reasons.

Comment from GRUBERND on 25 February 2017 at 19:38

@BushmanK, maybe you are right about me. but then maybe not. who knows. what i do know is that people get all worked up about personal preferences like tabs vs spaces, emacs vs vim, linebased vs xml, and or even where to put a semicolon.

it's all fun and games, yes, but in the end the important question is:

  • does it work?

cuisine:italian=yes; cuisine:pizza=yes`

and – surprise! – both variants work, so the tools better be ready to handle them. because OSM is one of those projects that was started to create a solution instead of adhering to some theoretical standard.

Comment from Carnildo on 25 February 2017 at 20:17

There are four types of queries that an end user is likely to perform on a value:

  • "One": "I want a restaurant that serves Italian, and I don't care what else it serves"
  • "Exact": "I want a restaurant that serves Italian and nothing else"
  • "Any": "I want a restaurant that serves Italian or Chinese, and I don't care what else it serves"
  • "All": "I want a restaurant that serves both Italian and Chinese"

The first two are easy to perform on a semicolon-delimited list: they are a substring search and a string match, respectively. The third and fourth are where semicolons fail: you need to actively parse the list, perform multiple substring matches, or otherwise jump through hoops. Fortunately, most people looking for somewhere to eat are looking for a "One" query, so semicolon lists are fine.

This doesn't apply for other things like auto repair, where an "All" query such as "I want four new tires and an oil change" are reasonably likely.

Comment from GRUBERND on 25 February 2017 at 22:24

@Carnildo may i suggest:
"Any" can be presented as the result of two "One" queries.
and "All" is finding the results that are on both of those "One" queries.

anyway, the end user should never have to worry about those details, since that is the job of the respective frontend. unless they want to, of course.

the difficulty with open projects is often that users get confronted with the real data although they don't want or need to.

Comment from BushmanK on 25 February 2017 at 23:00

@GRUBERND, there is a very clear difference between a personal preference and a practical advantage.

If it's about a personal preference, people look for arguments in favor of it and refuse anything that is against it without any logical explanation. If it's about a practical advantage, there should be a definition of goal and logical comparison of features tested against this goal. However, people can obviously have different views on goals.

Here, I've described how I understand the goal of this project, so I'm just testing any existing approach against this - there is no element of a personal preference. While you've presented your personal views on who should be handling OSM data and who shouldn't. You've also used "an elegance" as a positive feature - that is totally personal and subjective, leave aside that you haven't defined a goal to test your statements against.

Both variants you've presented do work, but your argument is fallacious since it was demonstrated many times that these two variants work differently and require very different effort for certain tasks.

Comment from GRUBERND on 25 February 2017 at 23:25

@BushmanK .. i totally agree with you that having one dataformat (whatever it may be) across a dataset is better than anything else. no discussion required.

and me describing semicolon lists with the words "beautiful" and "elegant" was pretty much in direct response to your usage of the word "blunt". both carry personal preference, but hey, that's what makes us care. =)

anyway, as someone who works with large and sometimes huge sets of heterogenous data i gave up on "standards" and prefer best practice, because it gets things working. my reality – currently designing workflow & tools for 100k+ RAW images at a time with an inflow of 40TB+ per year – runs against my ideals of organizing and harmonizing that data.

call me a practical dreamer, i love clean data but that's not what is happening. there will always be a bunch of edge cases, better make the tools to cope with them than to fiddle with old datasets.

2cts. ymmv.

Comment from BushmanK on 25 February 2017 at 23:28

@Carnildo, that is exactly what makes semicolon-delimited lists harder to work with.

To make "any" or "all" query possible, you either have to:

  • perform as many queries as arguments are in the original query and do union/intersection on results;
  • preprocess all lists to break them down into separate properties (to look exactly like a single-value syntax).

Delimited lists creating a problem where it doesn't originally exist. And it is not about an "end user", it is about data consumers who should not (even if they can) go through the hassle of additional preprocessing just because someone thinks that delimited lists look better. The vast majority of OSM tags are, for example, easily queriable by Overpass in a single query or usable directly for style definitions in MapCSS.

There are only a few exceptions where a complex syntax is unavoidable, like working hours or road lanes. Even a road lanes scheme, having a relatively simple syntax, requires a massive regex processing in its stylesheet (check it out here ), so it illustrates perfectly, how hard it is to deal with delimited lists in values.

Login to leave a comment