OpenStreetMap

BushmanK's diary

Recent diary entries

Fake foreign names

Posted by BushmanK on 10 March 2017 in English (English)

This discussion might be truly endless, at least - while OSM has the same level of order enforcement as it currently has. I'm not claiming that I can say something new on this topic, but I just want to keep some arguments in one place.

There is a set of keys intended for language-specific names - name:<language_code>, such as name:en=*, name:fr=* and so on. OSM Wiki documentation explains its purpose quite clear: these tags should contain the existing commonly used names in corresponding languages (see Names article). It is not just a rule that comes out of nowhere. It originates from a core principle: OSM database should contain real factual information and nothing else.

If we take a look at Berlin, Germany with Overpass query, we'll see that only about 750 nodes, lines, and areas have English names assigned. Usually, these are amenities, where name contains common nouns. Like:

  • Botschaft der Republik Indonesien in Berlin (German),
  • Embassy of the Republic of Indonesia in Berlin (English),
  • Kedutaan Besar Republik Indonesia di Berlin (Indonesian).

However, as it often happens in OSM, this key became an object of massive abuse. The most common way of abuse is to put made-up foreign names there. By "made-up" I mean everything that is not an existing commonly used name. That could be transliteration or transcription of the original one. Again, OSM Wiki documentation is clear. It says: avoid transliteration and explains pretty well, why. Obviously, there are people, who don't care. In addition to that, made-up names often can not be verified, while adding it could easily be qualified as tagging for a renderer/navigator (which is another violation of a core principle). Another negative aspect of these made-up names is that in many cases local communities are unable to support them properly. For example, Germans, Britons or Dutch mappers can not easily tell if Russian name is correct or not (knowledge of Russian is relatively rare and it is completely understandable). Therefore, it is impossible to clearly tell, if a certain name should be deleted, corrected or kept intact.

I think it is crucial to understand their motivation for breaking this rule to get an idea of how it could be fixed and how to avoid ineffective solutions. First of all, they do it intentionally, not by accident. Therefore, pointing at the documentation and improving it can not help. They simply made a choice to sacrifice data consistency for some "more important" thing. Reading numerous discussions of similar situations, I've been able to find several main types of motivation:

  • To help foreigners who can not read in a certain language when they travel to a mapper's country (like, Russians adding "English" names to everything in Russia to help English-speakers - see similar name:en Overpass query for Krasnodar - four times smaller city than Berlin),
  • To help people of mapper's nation who can't read in foreign languages whey they travel abroad (like, Russians adding "Russian" names to everything outside Russia to help other Russians),
  • OCD-like irrational behavior, expressed in a form of making everything uniformly tagged with a certain key. Here, I'm not claiming that these people have an obsessive-compulsive disorder (obviously, I'm not a mental health professional), but they do have certain visible traits, making their behavior very similar to one, specific for OCD: overvalued ideas, obsession with uniformity, lack of practical motivation in favor of compulsive actions, elaborate systems of "ritual" behavior.

Somehow, there are quite a lot of people with the first two types of motivation among the Russian mappers. It is a kind of ironical: statistically, only about 6% of Russians (according to their self-assessment) know a foreign language. It makes an ability of Russian mappers to transcribe, say, Dutch names, quite questionable. Awful transcription from Russian to English (the most commonly known language) often seen in name:en in Moscow, where a level of foreign language skills is supposedly up to three times better, only supports this doubt.

These people often see their actions as a "mission" and it makes almost impossible to convince them to stop. Basically, only a proper enforcement of rules (requirement of a factual information, verifiability, prohibition to map for a renderer/navigator) can help. My personal vision of how to distinguish made-up names from commonly used ones is that it is enough to require a statement of a source in every edit of this type. An indirect indication of potentially improper edits of this type is an ability of a mapper to communicate with a local community: if someone adds Russian names in Germany without mentioning a verifiable source for it while being unable to reply on changeset comments in German, it is very suspicious.

I have to add, that I have used Russian mappers of this type as an example I'm personally very well familiar with. It doesn't mean that only Russians do that. So, I kindly ask anyone who would like to "restore a justice" by giving another example here in comments of someone else doing it, to abstain from it and avoid being a fool.

It is about (not) following the rules in general, not about blaming someone in particular.

Туризм и tourism - в чём разница?

Posted by BushmanK on 4 February 2017 in Russian (Русский)

Всем, наверное, известен ключ tourism=. Но не все, очевидно, понимают его смысл точно. Как это часто бывает, дело в смысловой разнице между английским словом tourism и его аналогом в русском языке.

Россияне и жители некоторых стран бывшего СССР привыкли называть этим словом не рекреационный туризм, то есть путешествия налегке, без специального снаряжения, как англичане и американцы, а спортивный туризм: пешие походы, походы на плотах и байдарках, горные походы и даже альпинистские экспедиции. Это местная языковая особенность русского языка и российской (не исключительно, конечно) культуры. Если вы скажете англоговорящему человеку, что вы любите туризм, он никогда не вообразит вас идущим с рюкзаком по тундре или плывущим на плоту по бурной реке. Он представит вас, например, фотографирующим достопримечательности какого-нибудь города или относительно легко доступные природные достопримечательности. Чтобы рассказать о своем увлечении пешими походами, вам придется упомянуть hiking, а о походах и сплаве на плоту - rafting и white water rafting.

Ровно по этой причине, дословный перевод статей Wiki о тегах вроде tourism=attraction лишен смысла. Для англоговорящего, англоязычная статья читается в контексте того, что он знает о смысле слова tourism в родном языке. А для россиянина требуется давать языковедческое или культурологическое объяснение, касающееся точного смысла слов tourism и attraction, чтобы не возникали ситуации, когда кто-то решит, что речь только о парках аттракционов (на самом деле - о достопримечательностях вообще) или о точках интереса для занимающихся спортивным туризмом вроде порогов и перекатов на речках. Это не всегда очевидно (особенно, если просто изучал язык в школе или институте, но не имеешь опыта его реального применения, где вылезают ошибки из примера выше), но это так.

Removal of banner_url tags

Posted by BushmanK on 31 January 2017 in English (English)

At least some versions of Maps.me editor (for example, MAPS.ME ios 7.0.4) have a bug, that makes it adding a banner_url= key with a value, that contains a shortened link to some pages, related to Maps.me advertising system.

@Zverik (Ilya Zverev, OSMF Board member, who works for Mail.ru/Maps.me) have assured us in his message on the Russian OSM forum, that it's a known bug and so on. However, nobody knows, how long some users will keep that buggy version and how long these tags will continue to appear.

Using a simple Overpass query and JOSM, I already have removed about two hundred banner_url= keys from objects, recently added or edited by Maps.me users. It doesn't take much time, but I'm curious - is there a way to automate it and run a cleanup script, say, once per day?


Another question is more of a moral kind. Indeed, Maps.me developers (@Zverik himself) have provided a supervision tool for changesets created by Maps.me app. However, it puts the whole burden of supervision on volunteering mappers, while there is no way to reliably detect some anomalies automatically, it is understandable. But this exact case with banner_url= might not have a large impact (about five wrong tags per day), but it is known for a long time and for anyone familiar with a scripting language, it shouldn't take much time to develop an automated cleanup tool. But nobody of Maps.me developers did that. That doesn't seem like a responsible manner of working with OSM data. Keeping in mind, how pushy @Zverik (as a moderator) is about being nice and respectful to others on Russian OSM forum, this situation seems like a significant hypocrisy.

Pokemon Go trash

Posted by BushmanK on 28 January 2017 in English (English)

As it was recently discovered, Pokemon Go users have found (or, at least, they think they have) that OSM data affects Pokemon "nests" location. Some of them immediately started adding fake map features. Here are several examples: Washington, Indiana Washington, Indiana, United States, this user https://www.openstreetmap.org/user/forever2darkness/ , fake residential areas and footways. (Reported to DWG)

Charlotte, North Carolina Charlotte, North Carolina, Unites States, this user https://www.openstreetmap.org/user/Fishboy35 , fake parks, with the obvious names like "not an actual park" (Reported to DWG) And guess what? He got a welcome message for adding these fake features https://www.openstreetmap.org/changeset/45538011

Hi and welcome to OpenStreetMap! You've been adding some great detail to the map since you've joined. We count on local knowledge to make the map a better place! That said, please make sure you're only mapping real objects. Things you add go into the live database, which is depended upon by businesses, governments, and even humanitarian organizations. If you have any further questions about mapping, don't hesitate to give me a shout! Best, Ethan aka FTA

It was sent automatically by https://www.openstreetmap.org/user/FTA . How nice of him.

Assuming that they are interested in adding fake lines and areas, not nodes, this Overpass query could be used to load objects, last touched by a specific user, to look into one's activity. http://overpass-turbo.eu/s/lzL Keep in mind, that not all changed objects are added by such users, some of them are only edited. Do not delete all of them without examining them.

One important thing: I am not saying here that all mappers, who are Pokemon Go players, are adding fake objects. I am also not saying that they can not add real ones. I'm only saying that to prevent adding fake objects in the future, it is important to delete these as soon as possible after they were added. And only after that anyone could try educating these guys about OSM.

Update: a certain DWG member I've contacted preferred to preach about being polite with new users and teaching them to be nice instead of helping to remove that trash. It seems very "cooperative".

Update #2: It looks like at least some of obviously fake objects were removed by DWG reverting robot.

"Crisis of anarchy"

Posted by BushmanK on 25 January 2017 in English (English)

Reading this thread from the Tagging mail list, I've noticed several posts describing a problem of lacking governance, leading to endless (even looped) discussions and other negative consequences for the project. I agree with that because if we really want to create, maintain and improve "the best map of the world", it is counter-effective to rely on natural evolution only. Obviously, it will take too much time.

But I don't think that governance requires a government as a group of people, like Mark Bradley have proposed in that thread. It could be enough to have a formal guideline or a declaration of goals. For example, we can endlessly argue about semicolon-delimited values, presenting nicely polished pros and cons. But it is impossible to have a consensus if we don't have a goal or a guideline to test a certain proposal or a statement against it.

Like, how important is it to be able to query any new tag with an Overpass API? Is Overpass API a key part of the OSM infrastructure, or not? Should we care about having a universal meaning of every tag in any country? Should we care about not having a non-verifiable, vague or relative definition of a tag? What about overlapping definitions? Is it important not to force a data consumer to conduct a detailed research on every tag before he'd be able to use it? Is it acceptable to have a scheme that requires a pre-processing with a spatial query?

Without a common goal, any discussion inevitably degrades into a contest of personal views. A special group of people can't fix it because they also need something to test everything against it. They can, probably, try to replace it with their wisdom, but it doesn't seem to be a good idea.

Guidelines could be developed as a GitHub-hosted document(s) to be able to actually develop it and to maintain full control over it, instead of using a bit anarchist Wiki style.

There is an idea (also mentioned in that thread), that everything in this project is already governed by an implementation. But actually, it says nothing about how good or bad a certain thing is. For example, there could be an awful tagging scheme, but there could also be a person, who doesn't think so, and who have implemented it in his tool, renderer or editor. Other people, who have not enough knowledge to understand what's wrong with this scheme, just starting to use it (tag objects with it). Finally, we have a lot of objects tagged this way and it is nearly impossible to change this scheme because nobody cares except ones who keep saying that "changing things is the worst thing". Doesn't seem like a productive practice because it literally translates into a "the first one is the winner" principle - obviously, it makes any further improvements significantly harder.

As a bottom line, I want to say it again: government as a group of people doesn't seem to be necessary but having better guidelines does.


Update. The difference between a government group and a guidelines development group is quite simple and obvious.

The government is an executive organization. Usually, it looks into every case within its scope and has to make a decision how to deal with this specific case. Obviously, it takes a lot of time and it is still impossible without having a commonly accepted set of rules. Otherwise, it leans towards an authoritarian system.

Hypothetically, guidelines development group is a "legislative" organization. It has to make decisions regarding of technical questions before such question will raise. The solution gets a power of rule. So, it solves a lot of similar future questions at once. It saves time, it can't be personal, it is evidence-based.

Another example of lacking abstraction

Posted by BushmanK on 17 January 2017 in English (English)

I'm not really complaining here, at least - since I haven't stumbled across this problem for several years of being a mapper, but from the point of view of having a better tagging system it worth mentioning. And it is not a proposal. So, it's just an example. Please, be wise and treat it accordingly.

Man-made structures are super-common landscape features, even far away from the populated areas. And we have numerous tags to indicate such features. However, there is the whole class of man-made objects, barely represented in the OSM: various cables.

We have at least three tags for very specific cable structures: barriers made of poles and a cable, intercontinental communication cables, aerial power lines. At the same time, there is no way to indicate that there is a cable hanging somewhere without knowing it's a power cable or a barrier. Obviously, there are cases, when these tags are used to map for a renderer because someone wanted to indicate that there is a cable no matter what.

Somehow, we've managed to get a decent tag for a pipeline. It doesn't involve any implications about its purpose or any other property - it's just a pipeline, nothing more. So, if I don't know what kind of pipeline is this, it is totally acceptable to tag it using man_made=pipeline and location=overground - that's what I see.

But there is no way to map a cable, hanging between the poles along the street since, without any special knowledge, you can't tell, if it's a temporary power cable, communication fiber-optic cable, analog CCTV cable or something else. Sometimes, it is impossible to tell even if you know a lot about this topic. And it usually doesn't really matter, what does this particular cable serve for. That's because this kind of data will always be very fragmentary (unless it's not imported) and outdated due to lack of public interest.

The current situation, when power lines have some kind of special treatment, seems to be a result of historical circumstances - many decades ago, only power cables, telephone, and telegraph cables were located on street poles. Later, overhead phone cables were moved underground while telegraph lines were eliminated almost completely. So, modern common mapping products, including the OSM database, still have just power lines, having the rest of cabling out of scope.

There is an important thing about that. Some people saying that any more abstract scheme is always more complicated. It obviously isn't true because there is nothing simpler than to indicate that "there is a cable". If you know the purpose of that cable, it is also very simple to indicate it with a separate tag, if you don't know - just leave it as is. Thus, better abstraction could potentially simplify mapping of cable structures instead of keeping it impossible, provoking mapping for a renderer, or making it more complicated.

Почему не любая вывеска - название?

Posted by BushmanK on 22 December 2016 in Russian (Русский)

Спор о том, что же вносить в name=, а что - нет, вечен, как сам проект OSM. Один из тезисов этого спора звучит так: "Раз это написано на вывеске, значит это название". Давайте разберемся, почему это мнение ошибочно.

Во-первых, напомню, что тег name= предназначен, в первую очередь, для имен собственных. Только "в первую очередь" а не "исключительно" - потому что в некоторых случаях к собственному имени в проекте принято добавлять определяющее слово, например - в случае name= для дорог, куда попадают также слова "улица", "переулок" и так далее. Почему мы их туда включаем? Потому что названия дорог в русском языке состоят из имени собственного и имени нарицательного (те самые "улица", "переулок") и однозначно идентифицировать дорогу можно, в общем случае, только по полному названию. Но правило от этого не перестает действовать: внесения определяющих слов в name= следует избегать, когда это возможно. И это всегда возможно сделать, когда объект полностью описывается тегами. Почему в OSM важно делать именно так? Потому, что OSM - не карта, на которую смотрят глазами, а база данных, на основе которой делаются самые разные продукты, в том числе - карты. Плюс, это международный проект, а только система тегов обеспечивает независимость от языка (конечно, только если она четко определена).

Например, голубятню следует обозначать building=yes man_made=dovecote а не building=yes name=голубятня - такой подход допустим только в Викимапии или Народной карте Яндекса.

В примере выше всё выглядит очевидным, по крайней мере, для большинства участников. Правда, и на голубятне можно встретить иногда табличку, гласящую "Голубиный питомник № такой-то города Москвы - Московский городской клуб голубеводов", и кого-то, кто воспринимает всё буквально, может потянуть внести что-то из этого в name=.

Но вот другие примеры: пункт приема вторсырья с табличкой "Прием стеклотары", общественный туалет с табличкой "Туалет", мастерская с табличкой "Металлоремонт" или "Изготовление ключей", магазин продовольственных товаров "Продукты". Вносить ли это в name=, является ли это названием? Нет, не является и нет, не вносить. Эти таблички не содержат собственного названия организации, они указывают на оказываемую услугу или товар, который продается в магазине. Почему это не название? А вот почему.

В России, конечно, многие законы и правила выполняются не очень четко, но они, тем не менее, существуют. В отношении названий магазинов и учреждений, а также табличек, есть достаточно ясные правила, главным образом, в Гражданском кодексе РФ.

В ГК РФ есть Статья 1473. "Фирменное наименование", которая гласит:

  1. Юридическое лицо, являющееся коммерческой организацией, выступает в гражданском обороте под своим фирменным наименованием, которое определяется в его учредительных документах и включается в единый государственный реестр юридических лиц при государственной регистрации юридического лица.

  2. Фирменное наименование юридического лица должно содержать указание на его организационно-правовую форму и собственно наименование юридического лица, которое не может состоять только из слов, обозначающих род деятельности.

Уже из второго пункта этой статьи следует, что фирменное наименование организации не может выглядеть, как "Туалет" или "Металлоремонт".

Другая статья ГК, Статья 1474. "Исключительное право на фирменное наименование", пункт 3, гласит:

Не допускается использование юридическим лицом фирменного наименования, тождественного фирменному наименованию другого юридического лица или сходного с ним до степени смешения, если указанные юридические лица осуществляют аналогичную деятельность и фирменное наименование второго юридического лица было включено в единый государственный реестр юридических лиц ранее, чем фирменное наименование первого юридического лица.

То есть табличка "Продукты" также не может быть фирменным наименованием, так как она, очевидно, не является уникальной и не может быть зарегистрирована кем-то, как фирменное наименование.

На вывеске также может быть размещен товарный знак или знак обслуживания. Но Статья 1477. "Товарный знак и знак обслуживания" гласит, что на такие знаки также распространяется исключительное право, то есть если бы кто-то мог зарегистрировать товарный знак "Продукты" (а это, к слову, вообще довольно дорогой и сложный процесс, так что какой-нибудь полуподвальный дворовый магазин точно разорился бы, если бы у его руководства появилось желание это сделать), другие магазины не смогли бы это у себя писать на табличках. Конечно, зарегистрировать именно такой товарный знак - невозможно.

Есть еще один вид обозначений, на которые компания или индивидуальный предприниматель могут иметь исключительные права и которые могут использовать на вывесках, бланках и так далее - это так называемые "коммерческие обозначения". От других знаков и наименований они отличаются тем, что их не нужно включать в учредительные документы и реестр юридических лиц. Такое обозначение достаточно просто придумать и начать использовать. Но Статья 1539. "Исключительное право на коммерческое обозначение", пункт 2 гласит:

Не допускается использование коммерческого обозначения, способного ввести в заблуждение относительно принадлежности предприятия определенному лицу, в частности обозначения, сходного до степени смешения с фирменным наименованием, товарным знаком или защищенным исключительным правом коммерческим обозначением, принадлежащим другому лицу, у которого соответствующее исключительное право возникло ранее.

Это означает, что если какой-нибудь магазин начал использовать коммерческое обозначение "Продукты" раньше всех, другие уже не могут это делать.

Но это даже не так важно, потому что коммерческое обозначение не является названием, так что ни коммерческое обозначение, ни товарный знак или знак обслуживания не относятся к наименованиям, то есть не должны, в общем случае, попадать в name= сами по себе. В лучшем случае, они могут попадать в brand=.

Так что видя вывеску "Продукты", "Мясо", "Изготовление ключей", следует понимать, что это не только не название магазина или мастерской, но даже не его товарный знак или коммерческое обозначение - это всего лишь указание на вид продукци или услуги.

Для желающих иметь в базе информацию о том, что же написано на вывеске, неплохо бы иметь ключ, который позволял бы хранить это, дословно, как строку. Аналогично тому, который существует для мемориальных объектов. Но привыкших нарушать правила проекта старожилов переубедить практически невозможно, так что даже если тег для текста вывесок появится (например - sign_text=), указывать всё подряд в name, чтобы видеть это на карте стиля Standard, не перестанут.

Добавлю, что хорошей практикой проверки правильности того или иного обозначения, которое вы хотите чему-то присвоить, является вопрос, можете ли вы четко сформулировать, почему этому объекту должны быть присвоены такие теги, а не другие (близкие по смыслу), и почему такие теги должны быть присвоены этому объекту, но не могут быть присвоены другому (похожему). При этом нельзя использовать очевидность, как единственный аргумент или признак. Если вы можете ответить на такой вопрос, вероятнее всего, вы четко понимаете смысл обозначения и главное - где проходит граница между одним и другим. Значит, ваше мнение о конкретном обозначении - обосновано.

True namespace versus Colon-delimited suffix/prefix

Posted by BushmanK on 17 December 2016 in English (English)

Thanks to this diary entry, I just discovered a Wiki page Date namespace which exists there since 2014 as an improperly published proposal. Basically, it is about an introduction of syntax that supposedly gives us the ability to indicate a date range for virtually any key. Proposed syntax looks like this:

<key>:<year>-<year>=<value>
<key>:<date>--<date>=<value>
<key>:<year>-=<value>
<key>:-<year>=<value>

Unfortunately, I haven't been aware of this until today, but it's never too late to address it.

First of all, storing a variable value in a form of colon-delimited suffix (or prefix) is not the same as utilizing a namespace because namespace always serves a purpose of grouping. So, this proposal has nothing to do with a namespace.

The second problem is that variable colon-delimited suffix makes data processing awfully redundant. With a proper namespace suffix, it is easy to compare it with a set of known ones within a simple query, while the date "namespace" syntax requires a complex regex (including an ISO 8601 date format pattern) to find all keys containing it. There is no way to "just" select them all because this scheme does not include a qualifier of any kind. Even a bit improved syntax like <key>:daterange<year>-<year>=<value> would allow way simpler preprocessing, but it didn't happen.

You can read more detailed explanations of these two main issues on Talk:Proposed features/Date namespace page.

I am perfectly aware of at least one web service, where developers have managed to utilize this data salad, but it only means that they had enough free time on their hands. Anyone who wants to argue is welcome to start from writing an Overpass Turbo query, showing the names of Irish counties for a specific date (say, 1920) and showing it here in comments.

Towers and Masts

Posted by BushmanK on 29 November 2016 in English (English)

Browsing through the issues at Openstreetmap-carto (also known as OSM Standard style or "Mapnik" style) tracker on GitHub, I came across several issues, both open and closed, touching the topic of rendering vertical man-made structures such as poles, masts, and towers.

Communication engineering was my thing for awhile, so it always strikes me when at least two of these terms - mast and tower - are used in an uncertain manner. In one of the discussions on GitHub, the difference between masts and towers was called "philosophical". Actually, there is no philosophy (at least if you don't look at wrong and misleading examples in OSM Wiki). Because of that, I've added an engineering definition to pages of man_made=tower and man_made=mast both in English and Russian because what is in the first section of those pages makes zero sense and contradicts the basic principles of tagging, because it uses comparative terms such as "bigger" and "smaller" to distinguish between these structures. Tags tower:construction=guyed* are obviously redundant because if you need that, it means that object must be tagged as a mast, not as a tower.

I didn't want to rewrite the whole "definition" without discussing it, while I don't really believe that discussion could be successful, so I just added clear definition in case if someone would prefer it. Just for the reference:

Mast is a vertical man-made structure, supported by the guy lines and the anchoring system.

Tower is a vertical free-standing man-made structure, supported by its own foundation only.

(Anyone can find it even in Wikipedia, so it makes me wondering, how ignorant an author of these OSM Wiki articles was to write that.)

And it doesn't matter, that some contractors (and regular people after them) calling cellular communication towers "masts". It is not only wrong as it is to use "transistor" to call a radio receiver or "Xerox" to call a copy machine (which is common in some languages), but it makes it impossible to actually distinguish masts from towers for mapping purposes.

So, getting back to rendering, both "inverted T" and "inverted Y" are completely appropriate for tower symbols. Inverted T looks like a tower with a single stem or column, standing on its foundation, tower:construction=freestanding. Inverted Y looks more like a rough outline of a steel lattice tower (more strokes could be added to make it look fancier), tower:construction=lattice.

Masts are a bit more tricky, but just a bit. The most obvious symbol is an "inverted bird foot" symbol, similar to inverted Y with the central stroke, extended all the way to the bottom. It also looks like an inverted antenna symbol used for circuit diagrams. Central stroke represents the mast itself, diagonal strokes represent guy lines.

As a bottom line, rendering of masts and towers is not solely a question of style and preferred icons, it's also a question of using proper definitions. If definitions will get clarified one day, no philosophy will be involved in rendering and tagging anymore. (Personally, I really doubt that it will happen.)

Added from comments: These tags currently do not have "OSM-specific meaning", they are completely mixed into one mess - it is technically impossible to be sure if an object, tagged with man_made=mast is a mast and vice versa. So, changing anything can't do any harm, because it can't be messed up more than it currently is.

People are arguing about that only because almost every person has an own tradition of tagging and thinks that all others have a similar one. But it's not true - different objects are tagged similarly by different people as well as similar objects are tagged differently by them. Belief, that there is any global consistency in tagging masts and towers is just a fallacy.

Buildings vs. man-made structures

Posted by BushmanK on 16 November 2016 in English (English)

Reading a pretty long discussion of tagging the Holocaust memorial in Berlin, it surprises me, how unclear our Wiki documentation still is. The main controversy there is about (not) using building=* tags for individual parts of the memorial installation. For those, who are not familiar with this memorial, it mainly consists of rectangular monoliths (stelae) of different height, arranged in rows.

It has been mentioned in that discussion, that current convention about a qualifying feature for using building=* is a presence of room inside the structure (I'd add, that it also applies to "parent structure", since we still have building=entrance in use), so people can come in and stay inside it, and it's the purpose of this structure.

Sometimes, it could be an extreme case, like building=roof, where we usually have an open space under the roof instead of an isolated space, forming a room. But unfortunately, both Buildings article and Key:building have almost zero information on this particular topic.

I mean, come on, guys, building=* is among the top tags by usage and it is often misused, but there is no more or less clear definition of it in English documentation. In Russian documentation, due to widespread legal nihilism, we have the first paragraph of the RU:Здания (Russian version of Buildings article) that gives a definition of the term "building" for awhile. German article has a bit shorter explanation as well. English version says something about a couple of special cases (houseboats, for example), but gives no general picture, like if it were obvious. No, it is not, especially in OSM, where terms quite often have own special meaning and definition.

Personally, I don't see any issue with adding something like that to the English article by myself, except I'm obviously not a native English speaker and my English is American (I had an experience of complaints from a couple of Britons regarding of that).

I think, it is important to tell about the qualifying features and about the fact that building=* does not really create a contradiction with man_made=* because, for example, large TV transmission towers or lighthouses often have pretty much space inside, and it is intended to be a workspace for people.

Getting back to the Holocaust memorial, obviously, those stelae are neither buildings nor man-made structures in terms of current tagging schemes. These are historic=memorial memorial=stele or, better, historic=memorial memorial:type=stele objects, while there is still no definite way to tag complex memorial installations in details. Personally, I'd propose something like using historic=memorial for the whole boundary of memorial and corresponding memorial:type=* for the outlines of each particular part of it, since there are memorial complices consisting of multiple stelae, statues, plaques and obelisks within a certain boundary.

Access restriction data architecture

Posted by BushmanK on 22 October 2016 in English (English)

Currently, we have acceptable (but not ideal) tagging scheme for access restrictions. I'm talking about access=* key and its values. Here, I want to tell something about an application of this scheme in terms of topology, data interpretation, and data architecture.

OSM documentation says that it may be used on nodes, ways, closed ways (use on relations is unspecified, but it's not prohibited). And many people take that literally. Let's review some cases.

access=* is often assigned to a node, representing a barrier. Many mappers have an idea that it is way easier to add this tag to a gate, liftgate or another kind of barrier to indicate that access restriction starts here. In such cases, adding access=* to a highway=* is often omitted. A common argument for omitting it is that usually, navigation software starts the route planning from a point, which is publicly accessible. Then, analyzing the road network, it should avoid crossing those points with access=* restriction, if possible, just like it does with user-defined avoidance marks.

But there is a problem with this method. Having a road, divided into two parts by a node with access=* tag, it is impossible to tell, to which half this restriction is applied. This problem has only heuristical solutions, involving extensive analysis of road network. For example, it will be a nice educated guess to say that restricted portion is often a dead-end comprised of lower rank roads, while unrestricted portion likely has a connection with higher rank roads. But this kind of analysis is a problem with no definite depth, and it still doesn't give you 100% accurate result, which is unacceptable.

That argument about a route always starting from a publicly accessible point, is obviously false. There are enough cases when you have to calculate a route from a private property or limited access property. And the only universal way to deal with that is to disregard restrictions completely if no unrestricted route is possible. This solution is widely used, but it means devaluation of information about restrictions in the OSM database. Another known approach is to always avoid passing through the restricted portions of a road network by counting restriction points, but again, in a case of node-based restriction, it fails (requires a fallback to ignoring all restrictions) if both start and finish points are behind the restriction.

Applying access=* to a road itself eliminates any topological ambiguity since there is no question, to which part of a road restriction is applied. It is also clear if a route starts from or/and ends at a restricted portion. Therefore, no fallback is required.

Another option is to apply restrictions to a boundary, such as a fence or a land parcel, where a restriction is in effect. It helps to avoid tagging every road. However, it requires certain pre-processing to make data actually usable. Information about restrictions should be normalized by applying it to roads within a boundary using a spatial query (just like in the case of simplified address tagging, when addr:city is omitted on buildings, but can be easily propagated from a city boundary). To make it right, two additional operations should be performed: splitting roads at an intersection with a boundary, obtaining all members of a boundary contour. The latter one could be tricky if processing is performed offline.

So, it could be a drawback, but it doesn't create any problems with an indefinite depth.

The purpose of this diary entry is to explain how access restrictions work in case of being applied to every type of geometry and to demonstrate that existing node-based restrictions can not be reliably used.

Is RFC stage of proposal procedure just a formality?

Posted by BushmanK on 24 September 2016 in English (English)

Proposal procedure is not something required to introduce a new tagging scheme, however, some people are brave enough to start it for tags they want to introduce. But then, they are not obligated to follow this procedure, not only in form of being able to abandon a proposal (which is normal - you don't have to finish it if you don't want to) but also in form of disregarding the RFC stage.

RFC stands for Request for Comments. Supposedly, it serves to collect feedback and to correct errors found by reviewers. But currently, proposal author is not obligated to take any feedback into account, even in a simple form of replies on a Talk: page (leave aside actual addressing the issues, mentioned there). Since voters are not always reading Talk: page, they could be unaware of those open issues and cast their votes regardless of that. This makes an RFC stage (and the whole proposal procedure) nothing more than a formality.

My view on it is that voting stage should never be started (allowed to start) without addressing every issue submitted by proposal reviewers. Otherwise, no improvement of proposed scheme is possible if an author is lazy enough.

Super-broad "self-explanatory" tags

Posted by BushmanK on 7 September 2016 in English (English)

There is a proposal of healthcare=midwife tag in its voting stage. This proposal says, that "a midwife practice" is something self-explanatory. But this is a good example of bad tag design and here is why.

I understand, that majority of OSM members are men and only a few of them are medical professionals. So, it's hard to expect that they have a good understanding (especially, in a global scale, which is important since OSM is an international project) of specific healthcare services for females and healthcare services in general.

First of all, "midwife" stands for completely different persons in different countries and medical systems.

  • They have different education level (from professional certificate equivalent to Bachelor's degree from a college or Master's degree from a university).
  • In some countries they are a part of a regular medical system, in some - they represent an "alternative" medical system. There are, probably, countries, where private midwives do not have to have any formal education and they are, practically, witch-doctors.
  • In some countries, like Russia, they are basically just a special type of nurse in a maternity hospital. In others, they can work independently and provide an almost full scale of maternity-related medical services. In some African countries, midwives are allowed to do Cesarians, which is unimaginable in the Western medical system, where only MD surgeon can do it.

It means, that in one "midwife practice" women can only receive simple counseling service, in others - they can give birth and so on. All the above means, that there is no common denominator for all these different "midwife practices" except it's something for women and it is maternity-related. It doesn't seem like a good base for a single tag. By approving this tag, we'll get another thing we can put on a map, but can't really use for any real case without studying different aspects of every national medical system.

Again, I understand, that for men place like that sounds too abstract to think about it in details, but it doesn't mean we should introduce and approve meaningless oversimplified tag using a lack of knowledge as an excuse.

It even makes less sense keeping in mind we already have (unfortunately, abandoned) Healthcare 2.0 proposal, covering every tiny aspect of medical services. Yes, "it is very complex", but there are several complex tagging schemes in OSM and nobody died of using it, while it describes medical services perfectly.

It sounds official: OSM Standard style tiles are for mappers

Posted by BushmanK on 19 August 2016 in English (English)

When someone tries to compare OSM.org and any cartographic web service (usually, Google) it is hard to make people trust you when you telling them it's nothing more than technical website mainly intended for internal use by mappers. Same problem applies to OSM Standard (aka Mapnik) style. Finally, there is something more or less official to show them as a proof.

Andy Allan just gave this reply on question about expanding tile distribution infrastructure. (Full message text with added emphasis.)

We should be clear here - we have more than enough capacity to handle all the traffic generated by our mappers, editing software and every website run by the OSMF, local chapters and local mapping groups. Several times over.

We've always allowed other people to use our spare capacity on the tileservers, but recently it's got completely out of hand. Most of the use of our tileservers has become developers looking for free maps, nothing to do with the rest of the project. Often these are commercial companies who are using our tileservers and selling their apps. Subsidising commercial companies isn't the best use of community donations and volunteer sysadmin time, when there are many alternative services (such as those run by CartoDB, Stamen, etc) that provide zero-cost map layers based on OSM data anyway.

We do have plans to scale the tile infrastructure later in the year (cascading an old database server), in addition to the current process making sure that our OSMF tileservers are being mainly used for OpenStreetMap related projects.

I just want to keep it here to be able to quote it for any stubborn people, insisting on their own view of OSM infrastructure functions.

Nothing personal, just GPS tracks

Posted by BushmanK on 9 August 2016 in English (English)

GPS tracks, contributed by OSM members still are sometimes very important source of information for mapping. However, built-in OSM database has certain issues, such as inability to delete obviously harmful data, such as airborne tracks, huge wandering spots, created by receivers in standing cars, etc. Currently, all responsibility for GPS data quality is on contributors, since they have to take care of getting rid of wandering spots, super-generalized tracks, tracks with high GDOP and so on before uploading it. While only a few people actually know about all these aspects and care about it. Complexity of track contribution process (at least, it's not a"one click procedure") makes fresh tracks more and more rare.

Strava heat map is another (often - way more dense) source of GPS tracks. But it's limited by running and cycling activities.

For a long time, I've been saying, that any more or less popular mobile map/navigation application, which uses OSM data, can help to improve GPS tracks coverage. And being properly developed, it will not require any actions from its user, except giving his permission to record and upload tracks anonymously. It's an ideal case, where valuable contribution could be fully automated and independent of user's skills or knowledge. Set of simple filters (GDOP, top speed, geolocation source provider) can reduce consumed traffic and improve data quality.

And here is an example of tracks, collected for only a week by Yandex (major Russian web/mobile company) commercial applications. Note, how dense their point cloud is. And they updating their track layer, used for their own collaborative mapping project, every week. Yandex GPS point cloud This screenshot was taken by OSM member luiswoo using Bing satellite imagery and Yandex GPS point cloud, which can not be used for OSM mapping.

So, keeping in mind all that recent controversy about contributions, made by untrained users of popular application, based on OSM data, I'd like to mention again, that voluntary GPS data contribution does not require any training, if track recording application is properly developed.

Tagging: single values vs. value lists vs. separate keys

Posted by BushmanK on 25 July 2016 in English (English)

Ongoing voting on Education 2.0 proposal have demonstrated, that some OSM members don't understand the situation with multiple properties. I was a kind of surprised by this lack of understanding, since it is already explained in Wiki, so I want to explain it one more time.

Definitions:

  1. Single value stands for key=value case, for example: man_made=tower
  2. Value list stands for semicolon-separated list of values, for example: sport=basketball;volleyball
  3. Separate keys stand for multiple binary keys for each property, for example: sport:basketball=yes, sport:volleyball=yes

Case (1) is the most simple and traditional in OSM. However, if certain feature uses this tagging style, its value becomes exclusive, which means, there can be only one value. And for certain features it's completely acceptable, because, for example, man-made structure can't be a mast and a tower in the same time.

Case (2) was, probably, the first way to find a workaround for non-exclusive values. It's a straightforward way for those cases, when properties are non-exclusive, which means, certain object can have several properties of one feature. Just like in example above, a pitch could be used for playing basketball and volleyball.

Case (3) is, at first glance, similar to (2). And some people think that it's only a different tagging style. Indeed, it allows to represent the same situation, but with separate keys (tags) instead of value list in one tag, so semantically it doesn't have any advantage. It even looks more complicated, because its structure actually consists of three elements, not two: namespace:subkey=boolean_value, where namespace equals to key in case (2), subkey equals to each value from list in case (2) and boolean_value is just yes or no (in case of no, the whole tag is omitted).

Technical side

But the thing is, OSM is a database. Which means, those tags we using are actually data structures. And data structures should be usable. There is a whole applied science on that topic - data architecture. So, if you've never heard about it or you don't know what exactly it stands for, please, read an article about it in Wikipedia (link is above).

From the point of view of general public, there is no semantic advantage of method (3) over method (2). But there is huge advantage from the point of view of data architecture. That is why method (2) is strongly discouraged for the most cases, except only a few of them, as it is more or less clearly described in Semi-colon value separator article. In simple words, semicolon separator is usually acceptable for those cases, where value list contains a list of strings (portions of free text), used for labels or descriptions, not properties. In more technical words, where these strings from lists are not used for querying objects for any purpose, including subsetting data, applying rendering style, etc. It includes complex text structures such as lane tagging or opening hours, meant for deeper parsing by design.

Why value lists are so bad for querying? It's simple, it's all about performance and having known predictable data structures. Having something like sport=basketball;volleyball from case (2), to work with it, software needs to break it down first and to store resulting list somewhere in memory. Before it starts doing that, it should first determine, how many elements are in list. While in case of (1) and (3), number of operations is smaller and there is nothing unpredictable, since namespace:subkey always breaks into known number of strings (even if subkey is compound, like subkey:sub-subkey - usually we don't have to care about it, it can be processed as a single string) with predictable order.

There is another technological reason: OSM uses XML-style "attribute=value" constructs, therefore, it's quite logical to be able to use tools, intended for working with XML, such as query languages, frameworks and so on. And usually, these tools are not intended for dealing with values comprised by lists (by technical reasons explained above). While methods (1) and (3) are perfectly compatible with general XML ecosystem.

Anyone can easily get some experience and compare methods (2) and (3) by writing an example of Overpass API (significant part of OSM ecosystem) query for both cases, given in definitions section above: sport=basketball;volleyball and sport:basketball=yes, sport:volleyball=yes. Imagine that you need to select all pitches with both these properties. I can guarantee you, that method (2) will require utilizing CPU-hungry regular expressions, which will inevitably reduce query performance.

Objections

Voters on Education 2.0 proposal have expressed certain objections to method (3) I want to address directly.

... semicolon-separated lists are much more concise and therefore easier.

Easier for what? For reading - maybe (if such list is very short), but not for any technical purpose, including tagging preset development, editor interface development and so on.

... it's just an aversion to semicolon-separated values. I understand that this is a proposal by Russians, and some Russians such as XXzme have expressed their aversion to semicolon-separated values. I respect their opinion, but I have the opposite opinion, sorry.

No, it's logic, not an aversion. Read everything above. And it's not "another opinion". Opinion is a view, not necessarily based on facts or logic, however, logic behind preferable use of method (3) is explained both in Semi-colon value separator article and in this diary entry. If someone finds it false, feel free to point on it. Otherwise, you have an aversion, based on personal preference, which doesn't have to be respected, since it contradicts reasonable requirements to tagging schemes.

... you cannot model the whole world in a key:*=yes ontology, ...

That's actually classic demagogy - to say, that your opponent said something false (which he didn't) and to prove it's false to discredit him. Nobody claimed that it's possible and/or necessary to use method (3) to model the whole word (it's another piece of demagogy - incorrect universal quantification). If someone thinks, that collision of exclusive properties will never occur for certain feature, it's okay to point on it and explain, why method (1) is acceptable. But using demagogy usually reveals lack of real argumentation.

... that ridicules the key=value scheme in OSM

How exactly? Again, method (3) is actually recommended by OSM documentation in Semi-colon value separator article. And it is technologically much closer to method (1) than method (2) is.

Conclusion

Tagging method with separate subkeys and Boolean values is an effective part of tagging system. It doesn't replace other methods completely and nobody claims that it does. For cases, where any chance of collision between exclusive values exists, it's the only effective method. Value list method is equivalent to it only semantically, but it's inferior to it in technical aspects of data architecture, performance, usability. Usability for mappers strongly depends on tools: complex schemes are rarely used in fully manual manner - plugins, presets and other tools helping people to use it, therefore, it doesn't really matter, how exactly these tags look.

And the last thing. Being a part of OSM project for several years, I've heard many references to national features, like "in this country, we do it this way". But I've never seen anyone making general references to any nation in negative connotation, like it happened in one of comments to opposing vote on Education 2.0 proposal. Indeed, there are several members of Russian community, including myself and Xxzme (notoriously famous for his Wiki activity, but who has no connection to this proposal) mentioned in that comment, who trying to promote separate keys method for cases, where collision of properties can occur and who from time to time criticizing legacy tagging methods.

But shouldn't judgment be based on what is said and its logic instead of who said that or other personal features? I really hope, opposite thing will never happen again.

Experimental publishing of Sentinel 2 satellite data

Posted by BushmanK on 13 July 2016 in English (English)

Recently, I've published several tiles of fresh Sentinel 2 satellite imagery using Nextgis.com free spatial data hosting service. All tiles are available here. Two tiles were published by request, others are covering two major populated territories in Russia - Moscow region and Saint Petersburg region, one tile was randomly picked, it covers western part of Republic Mordovia, famous for endless forests (and one of my goals was to test it as a source of information about logging) and high concentration of prisons, located there. For some tiles, both visible wavelength and visible+NIR composites were made. Workflow of making those composites was quite simple:

  1. Make a dump of georeferencing data from any 10m/pix resolution channel image
  2. Use convert -combine from ImageMagick to merge single channels into 16-bit per channel RGB image.
  3. Use convert -contrast-stretch to manipulate image histogram
  4. Put georeferencing information back, save resulting file as 8-bit per channel RGB image, compress it with Deflate or LZW
  5. Upload it to Nextgis.com, set up web map and WMS service.

(Steps 1 and 4 were made using GlobalMapper, but could be done using GDAL and Python script.)

I asked people to give me some feedback, however, only one person (not counting those who asked me to make two of those tiles) informed me that he used this information to update forest boundaries, changed by logging and wildfires. I'm obviously doing it not for any kind of reward or acknowledgement, however, I really don't like to do anything nobody is going to use by whatever reason.

It also works as a kind of social experiment. Many times, when I mentioned Landsat 8 data in context of someone's complains regarding of outdated/missing Bing/MapBox imagery, people were saying something like: "Oh, I'm not a programmer, it's so hard to make those composites by myself and I don't know how to use it". To be precise, a couple of people managed to learn how to do that by their own. But now I gave everybody easily available data (with ready to use WMS definition strings for JOSM) and it doesn't seem like anybody wants to use it (or, by some crazy reason, they don't want to acknowledge, that they used it). So, lack of imagery or outdated imagery is just an excuse to do nothing. I'm not blaming anyone - OSM is a volunteer project, nobody has any obligations, everybody probably have own life and other things to do. But it would be nice if people just stop lying to themselves and to others that only obstacle for keeping something up to date is lack of fresh imagery.

Just in case if someone is interested, according to this, we can legally use Sentinel 2 data. Changeset or particular objects traced from it should have "Copernicus Sentinel data 2016" (or appropriate date) in source tag, as a requirement of informing data users about its source.

Let's pretend like Maps.me contribution is an import.

Posted by BushmanK on 28 June 2016 in English (English)

Case of Maps.me is, in many aspects, similar to cases of Potlatch and iD (on its early stage of deployment), but in certain aspects it is special. At least, in aspect of how frequent those edits are. Since currently it is hard to reach out to every Maps.me editor user, it makes the whole situation a kind of similar to imports, where we have massive amount of data, originally not suitable for OSM, often - with questionable quality (should I remind everybody of TIGER and its consequences for American OSM?), with certain systematic issues.

If this analogy is acceptable, it's logical to apply certain import guidelines to it. Think about paragraphs 2.9, 1.1, 1.2, 1.4, 1.5, 2.1.

So, if these requirements are acceptable (I mean, nobody thinks it's impolite to require it) for imports, why it shouldn't be applied to Maps.me, or whatever editor, which increases involvement of people, unaware of OSM guidelines, or provokes systematic mistakes? I've seen comments, where people literally opposed paragraph 2.9:

Take great care to avoid damaging the database and don't leave a messy import and assume that nameless OpenStreetMap contributors working in iD and Potlatch and will tirelessly complete your work. JOSM is better at for untangling messy data, but it's still difficult and you should do this work yourself if necessary.

And, by the way, it also says:

If your import does 'go wrong', or you needed to interrupt an upload half way through, then this should be reverted promptly. ... If you don't know how to revert an import, don't do the import in the first place.

When applied to editors, it means, that if it systematically provokes avoidable wrong edits, measures should be taken to prevent them. And developers should be prepared to do a cleanup.

I hope, nobody will read this diary entry as some kind of bullying of Maps.me developers or another complain. That wasn't my intention. My intention was to demonstrate, that OSM community does have more or less detailed guidelines for quite similar situation.

Maps.me is a new evil (instead of Potlatch)?

Posted by BushmanK on 21 June 2016 in English (English)

Some people were quite excited about recent announcement of built-in OSM editor of Maps.me navigation app, but now, first version of it is in use for many weeks, bringing up more and more complains about Maps.me users doing all kinds of unwanted things.

Since pointing on negative facts makes certain people feel offended, I want to clearly explain my position. I agree, that OSM could have more contribution from people, not deeply involved in project. And general idea of enabling them to do such contribution using their favorite navigation app is close to perfect. Maps.me is the first widely used app helping its users to contribute to OSM. This app does have certain nice features such as opening hours entry. I also realize, that it's only the first version.

However, there are certain flaws, making its future look not that optimistic (at least - in given current situation). I don't want to try arranging these flaws by its importance, but I'd like to list some.

One fundamental flaw is that authors of this app don't seem to care about demotivating effect of "rubbish edits". There is common mantra: "We need more OSM contributors". Yes, it's true, but only in certain ideal case, where new OSM members are at least thorough and responsible. I mean, if some newbie doesn't know how to do something right, he should learn it after being pointed on documentation or receiving some explanations in changeset comments/private messages. If he doesn't want to improve his quality of contribution, all responsibility for data quality gets automatically shifted to responsible OSM members overseeing particular territory. They have to fix every mistake to keep data tidy. And they have limited amount of time, good will and energy to work on OSM. Forcing them to take care about unusually large amount of bad data (including investigations using WhoDidIt, achavi and other tools) simply reduces amount of contribution they making. Keeping in mind huge difference between their productivity and productivity of general newbie, it's not an equivalent exchange, because single amenity, wrongly added by a newbie, steals time, enough for much larger contribution made by experienced person. I call it general fundamental flaw in attitude of Maps.me authors towards the whole OSM community.

Another flaw is that hosting Maps.me on GitHub currently seems more like a gesture. I'm not talking about code, I'm talking about interaction with community. @Zverik, as a member of Maps.me development team, has confirmed, that they have internal bug tracker, completely separate from GitHub, and that developers rarely checking GitHub issue tracker, however, they are reading bug reports, sent to dedicated email address. This situation makes it look like developers are non-responsive. It could be not completely true, but since there is no reaction from their side on GitHub, it's not unreasonable to conclude that.

Recently, an interview with one of Maps.me authors was published, and it sounds like major ideologists are going to leave this project in favor of own startup. Which probably means, that less people will actually work on development planning. I don't believe, that it will improve responsiveness of the whole team.

I know very well, how sensitive authors can be, when it comes to critical feedback from users. But if you can't treat it as useful feedback, that job isn't for you - development is not a kind of comfortable place for sensitive unrecognized genius. Indeed, angry OSM members, tired of cleaning after Maps.me users, could be pretty impolite, but at least certain comments contain useful suggestions such as, for example, separating editor workflow into it's own portion of UI. However, @Zverik recently called all those comments "non-constructive whining". It's typical, and it always leads to greater separation of developers from users. In psychology, it's called "avoidant behavior". I don't know, if it's just his thing or working style of the whole team, but anyway.

From the point of view of UX, there is an issue with built-in OSM editor. Users of Maps.me, except ones who already knew about OSM or who learned about it independently, usually have no idea what exactly they doing when editing the map. Automatically, they have no idea about project guidelines and so on. As I've already mentioned once, you can't make people edit OSM without telling them what it is. But now, they are a kind of lured into doing that. Therefore, their ideas of what they actually doing are often quite imaginary. Judging by many different edits, it's easy to conclude, that people thinking that it's just their local copy of map, or those edits are some kind of bug reports for professional mapping team, or these are just their own custom POIs. User interface provokes it even more.

Since UI/UX provoking systematic wrong edits of certain specific types, it must be fixed instead of blaming OSM members for their "lack of patience". Leave aside real programming bugs, such as breaking opening hours in certain complex situations, replacing "№" sign with "N" and many others (including ones, left without any response on GitHub issue tracker).

My personal view on this (and I've expressed it long time before the very first version of MapsWithMe/Maps.me) is that only way to allow people, who have no idea about OSM, to contribute is to limit available actions even more and to introduce more checks (such as duplicate check). It is way more complex problem than just development of an editor, since larger coverage of people, completely unaware of OSM, obviously brings more problems to solve before it will become effective and not demotivating for OSM members.

We have tracks. Are they any good?

Posted by BushmanK on 18 June 2016 in English (English)

Recent discussion with one of OSM contributors, who edited a forest path, located at narrow straight cutline, crossing pretty dense forest, made me thinking about some good example of how bad tracks can get under the foliage. Since that person used Strava's point cloud, I decided that it would be perfect example.

Let's take a look at this place. It's a clearing for high voltage power line, about 85 meters wide. There is a mixed use pedestrian/bicycle asphalt road (former service road), about 6 meters wide, it goes from south to north. There is another road of similar type and size, which goes to the east. Forest there is mixed (about 40% firs), old grown, about 19 meters tall.

Then, let's take a look at Strava data overlaid on top of high resolution imagery (click on it for original resolution): Click for full resolution

  • Point cloud is quite dense there, and highest density portion width stays about 5.5..6 meters regardless of foliage cover.
  • Width of corridor, fully covered with points (at least one point each 0.5 m) is about 25 meters with clear sky view and about 33 meters under the foliage.
  • Width of full spread corridor is about 45 meters with clear sky and about 85 meters under the foliage.

Spread width does not change immediately, when road goes under the foliage. It's caused by Kalman filter, used in every consumer GPS receiver to reduce random jumps to the sides from user's course line (which improves only appearance, but not quality of data, since it's based on assumption, that receiver moves more along more or less straight/smooth trajectory).

I don't know, how exactly Strava calculates the color of each pixel for their point cloud layer, but if it's just some simple additive method with clipping, highest density area will only grow in time. And it's only a coincidence, that currently its with equals to real width of these roads.

Since all tracks in point cloud are independent, it doesn't make any sense to say, that averaging improves precision (width of corridor). It actually even makes it worse, because more awful tracks piling up there in time. However, accuracy (distance between corridor median line and road median line) grows until certain "saturation point". At least, until full coverage (when each point of layer contains at least one point at highest resolution) within visible corridor will be reached.

What should we learn from it?

Random tracks, even several tens of them, can't be completely reliable under the foliage, especially since foliage density is different, and certain areas may affect GPS reception systematically (cause similar direction of jumps). Foliage potentially increases spread from about 10 meters to each side to 20 meters.

Is this value large? If you don't have any other data there - no, it's okay - there are roads in OSM, traced from Landsat imagery. If you're trying to improve accuracy of paths, traced by any sources, better than Landsat imagery - you probably shouldn't do that, GPS tracks are not enough accurate to give any improvement, even in case of accumulated sets of tracks.

Older Entries | Newer Entries