OpenStreetMap

Guided Tagging by Wiki generated JSON-formated rules

Posted by -karlos- on 2 February 2013 in English (English)

In my last post, I presented just an idea. The responses showed, there are quite a view ongoing activities already. We all share the idea of tagging guided by a tag database structured to a agreed schema. Some projects seem to compete but that's evolution in OSM. Some could interact with a little glue in between.

I really agree with the presentation by David.earl. I also like the idea, given in his video, to generate the data by parsing the Tag pages of the OSM wiki.

The name TagCentral first reminded me to a tag-centralizing Mafia. We do not want central decisions how to tag. But we need, like the central OSM database, a central tag database, maintaining code and a central server to run it.

At the moment we have two sources of tag data: First the tags, used in the OSM data as Taginfo and tagwatch present in analyses. Second the OSM-wiki with all its tag-pages. Both are crowdsourced (no Tag-Mafia :), both may be used to condense a tag database. Taginfo helps to prioritize and may show missing definitions in the wiki. Relations between tags, statistical retrieven, should be present in the wiki to.

Of course, no one wants to write an analyzer of human written wiki text. There is already a wiki-template "KeyDescription". It looks like this Template includes all the data necessary to generate a tag-schema. If not, it may be extended.Let's se the tag highway=trunk. The template includes value=trunk. "Trunk" is also the English word for that kind of road. But there is no option to append the used words in other languages. What about name=trunk and AT:name=Schnellstrasse? This way we would get the localization almost for free. If each localized wiki page has its own word definition in the template, all the templates can be merged to one tag schema with all local used words, and all local descriptions of course.

Right, what next after the template is extended and used accordingly? We need the code and a server to run the parser/scanner of the tag wiki pages. This Is the TagCentral mentioned above or simply an extenuation of the Taginfo if you like. It looks like the iD project is already doing good parts of this. After all the tag schemata are generated or updated, there should be a lot of checking and crosschecking with the tagdata statistics. There will be a list of bad templates. We could have things like The 10 most used tags without a wiki page, and so on.

The next step will need a lot of willingness of teamwork: the format of the schema. All the main editor teams should agree with it. (And to use it.) The use of JSON seems already agreed. Potlatch and iD share some developer. JOSM uses XML at the moment. My solution to that: There could be a converter to generate XML from a JOSM tag database. Or the tag database could be some common real database and generate the JSON files and the XML files and the files any other editor would like to use.

How much different projects are developing a tag schema?

  • David.earl did his speech but no code, is it?
  • iD in teamwork with taginfo seems the project creating and using a schema. And they are in contact to JOSM.
  • I read the russian page of Ilya Zverev by google translate. It seems to be quite the same. Does anyone have contact to him?
  • AndrewBuck is using OWL to define the schema. Thats quite scientific. Could it help to define the schema somehow more systematic?

The data, retrieved by now may help to offer menus in several styles, text oriented, selections by graphic symbols, etc. (see my last post) But it may be a problem to create the complex dialogs, some tags will need. There could be an extra editor. But I would prefer an extended or new created template to the wiki. Last time I mentioned a public schema editor but dropped it now. Instead there should be a web service to show the scheme, including the menus and dialogs, an editor would offer, if it uses the schema and the tag database. A changelog and some human eyes may good before the editors take over the new version.

Who and what?

  • First here has to be a talk about how fare we agree to go the way, described above. I will motivate you.
  • Next is the fixing of the schema. This block isn't the right place for that. The forum and a wiki page will do better. TagCenter or TaginfoPlus may be a good name.
  • There is code to write. I don't have that much time to help. There will be help by others, I am sure. AndrewBuck, Ilya Zverev?
  • There is code to run on a server. As the function is near to or part of the Taginfo, I think, it could run on the same server. Because Jochen Topf is German like me, we may have a chat in the next time. (The same with the JSOM team)

Comment from bryceco on 3 February 2013 at 05:50

JOSM, Potlatch and other editors already have schemas they are using that could be automatically translated into something like the TagCentral schema, and that in turn could be semi-automatically imported into the wiki where it could be easily extended by users and indexed by TagInfo.

The editor schemas tend to be "top-down": they say a node can be an amenity, amenity can be restaurant, restaurant contains cuisine. TagCentral is bottom-up: cuisine belongs to restaurant, restaurant belongs to amenity, etc. With the top-down approach you end up duplicating tag info that gets shared amongst many types of objects ("wifi=yes/no/free" for restaurants, cafes, pubs, etc.). But the bottom-up approach in some cases simply breaks: maxspeed belongs to highway, but not for highway=stop. TC doesn't have a mechanism for these exceptions.

In any case the current wiki metadata is pretty close as it is. The main issue I see is that it uses a "combinations" field instead of TC's "qualifies" so it does not sufficiently document the direction of relationships. It says that cuisine and restaurant are related but not restaurant is the primary tag and cuisine is an auxiliary tag.

Hide this comment

Comment from Zverik on 3 February 2013 at 09:50

My schema (I'm Ilya Zverev) is quite ready and parts of it are implemented in openstreetmap.ru's POI catalog: http://lists.openstreetmap.org/pipermail/talk/2013-January/065950.html

The plan is to build a separate site with that schema, populate it using OSM users (no automatic parsing) and make it the alternative to "Howto map a" and "Map features" wiki page. That would solve most of current wiki problems and allow generation of presets for editors and directories for POI visualizers.

Hide this comment

Comment from -karlos- on 3 February 2013 at 11:21

@ Ilya Zverev: Your attempt is the other way than mine?: Generate wiki pages by a schema. So all the human explaining texts are part of the schema? Or a schema controlled edit? Is you schema only about POI or also ways, areas and all OSM content?

At last, we both intend to give the output to the editors and to the renderers.

I love that dynamical POI visualization! I think, all rendering is changing to vector graphics. Your scheme is used to translate, a quite useful part of the schema idea. It would help your project it one could switch to the original texts and other languages to. What about other countries, do you use your own local POI data base?

Hide this comment

Comment from Tordanik on 3 February 2013 at 22:47

I think what you describe in your post is indeed something that should be implemented and would be very helpful for OSM.

Maybe I can help. I'm actually using a bot to extract template data from the wiki already. Right now I'm doing this for the software templates. The software catalogues have been kept up to date using this automated process for years.

Rewriting this as a more general framework for template extraction (preferably in a programming language that is more popular among OSM developers) has been on my todo-list for quite a while - there are several unrelated ongoing projects I would need this for. But once that tool exists, it could be easily used to expose the data from key/tag description sites as JSON, too.

Hide this comment

Comment from Zverik on 4 February 2013 at 08:18

Karlos, I don't intend to generate wiki pages, but to create an independent infrastructure. Because otherwise I'm just duplicating both data and problems of wiki, and there's no point to it.

Openstreetmap.ru POI database is restricted to Russia and Ukraine at the moment, and as far as I know there are no plans to extend coverage. But sources are open, and anyone can install their local POI database.

Hide this comment

Comment from -karlos- on 5 February 2013 at 21:47

@bryceco: Sorry for my late answer, I was so busy and happy by discovering your iOS App "GO MAP!!". Your komment certainly is to be considered. But I don't see it as an obstacle. See more below.

@Tordanik: "If you want TTTBot to create a...". Yes that seems to be a good step. You would extend the Bot beyond "Software", to Value- and KeyDescription at last. Although it will be a bit more than a 1:1 (Template:Column) processing (You are German to? Could make things more easy sometimes.)

@both and all:

Egg or chicken first?

I did not think about updating the wiki by editor schematas. Certainly all existing tag data should be cross checked. As the "TagCenter" intents to work from the wiki to the editors and renderes, the wiki update would be a once to use code. I imagine it the other way: If the TagCenter is coded and able to generate schemes for all editors, this schemes may be compared. The difference will not be much, I think. So some edits in the wiki will do it, may be with some small semiautomatic scripts.

Upstairs Downstairs

The schemata are top-down, the wiki is bottom-up more ore less, the TagCenter will be both. Between the wiki scan and the schemata generations will be a database (or simply a lot of arrays of structures in the used programming language.) The references between tags will be cross inserted in both, the upper and the lower tag. But it has to be clarified who is up. This is clear for Values. "implies" is pointing further down. But it ends with "railway" i.e. There is no transport=railway, not yet. What we also need is a wiki Template KeyGroup. Even there is no "group=" in the KeyDescription, the TagCenter will work fine with an entry in the KeyGroup. And there you get a menu tree ending (or starting) with root= or trunk=transport. That tree may be on a wiki page "tag menu" containing a lot of KeyGroup templates.

(R)evolution

Are the wiki KeyDescription and ValueDescription carved in stone? Or may we add a "group=" in KeyDescription? Will it be a big revolution to change the references in the templates to vectored once? May we add the needed info for language localization? The funny thing is, if each local wiki page inserts only the info of its own language, the TagCenter will accumulate it in a common scheme entry. What about a KeyDescription2?

Exclusion

highway=stop is a good example for exceptions. maxspeed is not for nodes and doesn't make much sense for track. Will it work with implies and excludes? Does it make sense to edit all exceptions in the wiki or as it is now, direct in the scheme? We won't know until we try. A proposal wiki page may be the next step.

Nomen est Omen

Today, each editor (and renderer?) uses its own scheme. But the content will be about the same. And may be, the editors can agree about one "Unified-Tag-Scheme", do give it a arbitrary name.

Hide this comment

Leave a comment

Parsed with Markdown

  • Headings

    # Heading
    ## Subheading

  • Unordered list

    * First item
    * Second item

  • Ordered list

    1. First item
    2. Second item

  • Link

    [Text](URL)
  • Image

    ![Alt text](URL)

Login to leave a comment