My native language is a dialect of the lingua franca of the late 20th and early 21st centuries. And I live in a culture that is notorious for being adamantly monolingual. But I thought I had some understanding of the issues of mapping names in a way friendly for internationalization.
It seems pretty clear cut when you read the wiki. Put the local name, in the local language, as the value for the “name” tag. You may also put it in the “name:<lg>” tag value too.
To be clear, I am not worrying about the legal name, short name, international name, alternate name, or other various names for a place. Just “the common default name” to put in the “name” tag.
I make paper maps for myself and if traveling like to have both the local name as I will find on signs and the name in English, if available, both rendered. For example:
I may not be able to read the local language but I can compare the glyphs on my map with the glyphs on a sign to see I am entering a specific village. And if an English name exists, even if only (automatic) transliteration, I will have something to verbalize.
But my attempt to produce a map of a trekking destination in Nepal showed that it is not that simple.
First, the local mappers in Kathmandu and apparently throughout Nepal decided to put “Romanized” versions of their names in the name tag. I am not sure what “Romanized” means in this context as they did not specify what phonetics might be used when “Romanizing”. The current tagging of Kathmandu breaks Internationalization:
Please, please, don’t do this. It is specifically discouraged in the wiki. If a transliteration is needed, it can be done automatically by the data consumer. The tagging should be:
Second, even if the “name:<lg>” value is set for the local language but the “name” value does not match how can a renderer determine what the local language is to use for the area? So far the solutions are ad hoc. I have seen suggestions that this has been solved by the OSM DE people. But when I look at the tool kit I see the problem it solves is transliterating a local language if the tag for the desired language does not exist. That is a good and useful thing to do but it is not the problem I am worried about.
Another suggestion is to follow the lead of SomeoneElse who has produced maps with Welsh and Scots Gaelic names in areas where those languages are dominant. This is closer to what I am looking for.
But both the German automatic transliteration and SomeoneElse’s implementations reveal that they use internally coded polylines to define areas where a language is used. In effect they are implementing an additional geographic database to help interpret the OSM geographic database. This seems terribly wrong.
Adding language boundaries to OSM has been discussed and there are issues. So it is unlikely that it will be agreed upon.
One of the issues is that there are places that share a boundary with many localities with many languages. For example, the coast line of the Mediterranean Sea is shared with many countries where many languages are used. What name should be used for that feature? The English one I used? Probably not.
Looking at it from the point of a simple data consumer it seems the following rules would go a long ways to improving how OSM derived maps can be presented:
- If the feature has a name in only one language, then put that value in for the “name” tag.
- Strongly encourage mappers to also put that value in the “name:<lg>” tag where “<lg>” is the code for the local language. This will allow data consumers to identify the language used in the “name” tag.
- If more than one “name:<lg>” tag is present, one of them must match the value of the “name” tag. This allows identification of the language of the “name” tag.
- If the feature is on a boundary with more than one language, then there may be no local “common default name”. In that case, remove the name tag. If there is a QA tool that complains, then put a “noname=yes” tag on the feature. Yes, the feature has a name, it has multiple names. But we don’t know which is the “common default name” used by the multiple sets of locals speaking different languages.
- Don’t try to be helpful and put multiple versions of the name in the name tag. That is almost a guarantee that the name tag should be omitted instead. I’d fix that for Mount Everest except that the values in the “name” tag don’t match any of the values in the many “name:<lg>” tags and I haven’t a clue where to move them since I read neither Nepali nor Chinese.
The above suggestions will not fix all the issues with internationalization of names. But given the OSM schema as it stands today, it will make some types of processing possible. For example, if you want to render a map in language “xx” then you can:
- Use the “name:xx” value if it exists.
- Otherwise determine the language of the “name” tag, possibly by looking at all the “name:<lg>” values for a match. Once the language has been determined, automatically transliterate the “name” tag to language “xx”.
Rendering a map, like the reference one at www.openstreetmap.org or as created by SomeoneElse where local names are used is a harder issue. If a feature has names in multiple languages (e.g. Mediterranean Sea) then which name should be used? That does not seem to have a general answer. If the communities that border the feature have mutually agreed to a common default name then that could be put in the “name” tag. But I suspect a mutually agreed upon default name is unlikely to be agreed to by the parties closest to the feature if they are using different languages.