OpenStreetMap

Fake foreign names

Posted by BushmanK on 10 March 2017 in English.

This discussion might be truly endless, at least - while OSM has the same level of order enforcement as it currently has. I’m not claiming that I can say something new on this topic, but I just want to keep some arguments in one place.

There is a set of keys intended for language-specific names - name:<language_code>, such as name:en=*, name:fr=* and so on. OSM Wiki documentation explains its purpose quite clear: these tags should contain the existing commonly used names in corresponding languages (see Names article). It is not just a rule that comes out of nowhere. It originates from a core principle: OSM database should contain real factual information and nothing else.

If we take a look at Berlin, Germany with Overpass query, we’ll see that only about 750 nodes, lines, and areas have English names assigned. Usually, these are amenities, where name contains common nouns. Like:

  • Botschaft der Republik Indonesien in Berlin (German),
  • Embassy of the Republic of Indonesia in Berlin (English),
  • Kedutaan Besar Republik Indonesia di Berlin (Indonesian).

However, as it often happens in OSM, this key became an object of massive abuse. The most common way of abuse is to put made-up foreign names there. By “made-up” I mean everything that is not an existing commonly used name. That could be transliteration or transcription of the original one. Again, OSM Wiki documentation is clear. It says: avoid transliteration and explains pretty well, why. Obviously, there are people, who don’t care. In addition to that, made-up names often can not be verified, while adding it could easily be qualified as tagging for a renderer/navigator (which is another violation of a core principle). Another negative aspect of these made-up names is that in many cases local communities are unable to support them properly. For example, Germans, Britons or Dutch mappers can not easily tell if Russian name is correct or not (knowledge of Russian is relatively rare and it is completely understandable). Therefore, it is impossible to clearly tell, if a certain name should be deleted, corrected or kept intact.

I think it is crucial to understand their motivation for breaking this rule to get an idea of how it could be fixed and how to avoid ineffective solutions. First of all, they do it intentionally, not by accident. Therefore, pointing at the documentation and improving it can not help. They simply made a choice to sacrifice data consistency for some “more important” thing. Reading numerous discussions of similar situations, I’ve been able to find several main types of motivation:

  • To help foreigners who can not read in a certain language when they travel to a mapper’s country (like, Russians adding “English” names to everything in Russia to help English-speakers - see similar name:en Overpass query for Krasnodar - four times smaller city than Berlin),
  • To help people of mapper’s nation who can’t read in foreign languages whey they travel abroad (like, Russians adding “Russian” names to everything outside Russia to help other Russians),
  • OCD-like irrational behavior, expressed in a form of making everything uniformly tagged with a certain key. Here, I’m not claiming that these people have an obsessive-compulsive disorder (obviously, I’m not a mental health professional), but they do have certain visible traits, making their behavior very similar to one, specific for OCD: overvalued ideas, obsession with uniformity, lack of practical motivation in favor of compulsive actions, elaborate systems of “ritual” behavior.

Somehow, there are quite a lot of people with the first two types of motivation among the Russian mappers. It is a kind of ironical: statistically, only about 6% of Russians (according to their self-assessment) know a foreign language. It makes an ability of Russian mappers to transcribe, say, Dutch names, quite questionable. Awful transcription from Russian to English (the most commonly known language) often seen in name:en in Moscow, where a level of foreign language skills is supposedly up to three times better, only supports this doubt.

These people often see their actions as a “mission” and it makes almost impossible to convince them to stop. Basically, only a proper enforcement of rules (requirement of a factual information, verifiability, prohibition to map for a renderer/navigator) can help. My personal vision of how to distinguish made-up names from commonly used ones is that it is enough to require a statement of a source in every edit of this type. An indirect indication of potentially improper edits of this type is an ability of a mapper to communicate with a local community: if someone adds Russian names in Germany without mentioning a verifiable source for it while being unable to reply on changeset comments in German, it is very suspicious.

I have to add, that I have used Russian mappers of this type as an example I’m personally very well familiar with. It doesn’t mean that only Russians do that. So, I kindly ask anyone who would like to “restore a justice” by giving another example here in comments of someone else doing it, to abstain from it and avoid being a fool.

It is about (not) following the rules in general, not about blaming someone in particular.

Discussion

Comment from ff5722 on 11 March 2017 at 12:07

A user has added Vietnamese names to all major administrative divisions in China. http://overpass-turbo.eu/s/nqq

Same (by another user) for Japanese names. http://overpass-turbo.eu/s/nqr

In various imports, an automatically generated name:en was added to the imported places in China, from tiny village to city. http://overpass-turbo.eu/s/nqs

I think more than half of all villages (72,000) on OSM in China have a name:en, of which maybe a few hundred at most have an accepted transliteration. Personally I’m also guilty in this, however I can read the Chinese mostly myself… So far I found there are only a few places where the ‘automatic’ transliteration to English doesn’t work well, which is in Tibet, Xinjiang and Inner Mongolia (where the English name is usually based on the respective region’s minority language) and places where the Chinese name is also pronounced different from standard pronounciation. E.g. 六安 is not Liu’an but Lu’an. “Although the character “六” (literally: “six”) is normally pronounced “Liù”, in this case it changes to “Lù” on account of the local dialect.” (https://en.wikipedia.org/wiki/Lu%27an)

The only way to stop this is when commonly used renders (OSM Carto, MAPS.ME, Mapbox, Mapquest, etc) will do automatic transliteration. Until then I think it’s actually justifiable to add them as tags, however pollution of the database with incorrect transliterations is indeed a huge problem. As long as the transliterations themselves are indisputable and use an official transliteration, they are not incorrect and thus not really tagging for the renderer.

I think it’s somewhat similar to tracing roads from satellite in an area you have not visited. Strictly speaking you should draw everything as highway=road. But if you see that one road only gives access to a small village and the other road clearly is a wide through road, is it wrong to add classifications until someone with local knowledge checks it? If you don’t give classifications, your road network is pretty useless for routing. If you do give classifications, you risk doing it wrong.

Comment from BushmanK on 11 March 2017 at 17:38

@ff5722, If it’s about those people who see adding made-up names as a “mission” to help mythical foreigners, only total support of transcription/transliteration (with respect to user’s native language) can actually remove any base for their actions. But it sounds unrealistic. I think, more unrealistic than stricter order enforcement that doesn’t require writing tons of code.

And it is tagging for the renderer/navigator regardless of correct or incorrect transcription/transliteration, exactly because people who do that want to achieve a certain goal linked to rendering and/or name search.

Your analogy with road network seems to be correct, at least - at a certain level. Exactly because of that, humanitarian mapping made almost exclusively remotely always has an inherently lower quality level. But that’s completely different story.

Comment from Severak on 22 March 2017 at 13:31

Would be very interesting project to make map with automatic transliteration (with fallback to name:xx).

I think it would be doable and quite useful.

Comment from Severak on 23 March 2017 at 10:56

Would be very interesting project to make map with automatic transliteration (with fallback to name:xx).

OK, Germans already did it. See it in action.

Log in to leave a comment