Yesterday I finally made a first pass of _sevbot.
Current goal was to fix abbreviated toponyms around Ukraine, make Ukrainian names primary where they're swapped with Russian ones, and add transliterations into English. This first pass touched 7998 ways across the country.
Currently bot is not really automatic, as it uses the country dump as an input and produces XML file on the output which I then load into OSM.
I am looking for suggestions on how to clip ways by the country border (it is designated by a relation), in order to avoid problem of conflicts. My idea is to have a two step process, first to filter changesets by country bounding box (rough clipping) and then clip it by the way extracted from the relation, but I do not want to reinvent the wheel if there is a ready-to-use solution.
Future plans for the bot are: put it into cron, and make it fix Ukraine-specific typos and errors, try to employ automatic additions of missing name translations if ways with same name exist in other places within the country. Put the bot on the duty of auto-transliteration of Ukrainian names into English (we have a standard set by the government). And the long standing plan is to guard KOATUU indexes of all cities across Ukraine (that is a governmental standard for assigning unique IDs for each village).
Comment from Andy Allan on 6 December 2010 at 13:22
Please don't auto-transliterate. That can be done by other pieces of software if required. You shouldn't be inserting data into openstreetmap which is just a processed version of other openstreetmap data.
Comment from _sev on 6 December 2010 at 14:23
That is standard country-wide practice induced by the government. And there is no other way. Well, there are few exceptions when there are established English names for entities in Ukraine, but those are just couple of hundreds out of tens of thousands, and for those there are preventive means in the bot.
The main problem is that not many people can read Cyrillic, and all maps in Ukraine are just in that. For Euro 2012 there are plans to have maps in Latin alphabet, and instead of people transliterating the names each time manually, which is error-prone, I introduced this bot. Moreover, in one of subsequent passes I am going to fix the mess with bad transliterations (there are couple of home-grown substandards which are used sometimes).
So, this is in fine agreement with Ukrainian OSM gathering.
Ah, and to add a bit more to the reasoning, all commercial maps, both digital and on paper are using transliteration.
Comment from Tomash Pilshchik on 7 December 2010 at 21:25
>Please don't auto-transliterate. That can be done by other pieces of software if >required. You shouldn't be inserting data into openstreetmap which is just a >processed version of other openstreetmap data.
The only other way to do it is to auto-transliterate on the fly. But doing this on a world-wide basis would require a database of transliteration practices (and grandfathered exceptions) for each country in the world. Then any program to render an English-language map would have to deal with the local scripts and properly apply the database to them. I think it is better to let someone with local knowledge perform this tricky task and store the result in OSM.