OpenStreetMap

Time to cleanup the wikipedia:xx tags?

Posted by PlaneMad on 21 November 2016 in English.

There are over 40,000 features with old style wikipedia:lang=* tags that need to be migrated to the more popular wikipedia=lang:* format. Should this just be a mechanical edit? Or should we have a worldwide campaign of community members checking each and every one?

screenshot 2016-11-21 11 48 22

taghistory

UPDATE

Looked into the features with wikipedia:ru and the majority seem to be place nodes in Ukraine. Around 28k of them already have a wikipedia (mostly Ukranian) tag, so maybe migrating them may really not have any added benefit.

So the question now is, if its better now to discard the wikipedia and wikipedia:lang tags and just have a wikidata tag? What information would we lose?

screenshot 2016-11-21 14 47 22 http://overpass-turbo.eu/s/kdw

Discussion

Comment from ff5722 on 21 November 2016 at 08:33

In case there is more than one “wikipedia:lang” tag, you will lose data when removing all but one, unless there is also a “wikidata” entry linked.

Also, from the graph, it seems like the “wikipedia:de” tag was already converted at some point in 2015.

Comment from tbicr on 21 November 2016 at 08:55

What about territories with several languages? What language wiki link we should left in this case?

Links to wiki has issues and also duplicate more strict tag http://taginfo.openstreetmap.org/keys/wikidata. Maybe better move wikipedia tags to wikidata?

Comment from PlaneMad on 21 November 2016 at 11:42

ff5722, tbicr good catch. Just added an update.

Comment from SomeoneElse on 21 November 2016 at 12:08

I’d ask the question “what problem are you trying to solve?”. If it’s not broken, don’t try to fix it.

More importantly each wikipedia community is pretty distinct - I can imagine why a community might want to first link to a particular wikipedia language. Here, for example are the wikipedia pages for the country of Serbia in Serbian and Albanian

https://sr.wikipedia.org/wiki/%D0%A1%D1%80%D0%B1%D0%B8%D1%98%D0%B0

https://sh.wikipedia.org/wiki/Srbija

as you can see, the maps are different - one contains Kosovo, one does not. You can’t assume that all wikipedias agree with each other, and if a wikidata entry was created from just one wikipedia entry, then it’ll naturally reflect the biases of that wikipedia entry.

In Kosovo’s case the wikidata entry https://www.wikidata.org/wiki/Q1246 says “not fully-recognised state in Southeast Europe” (which is correct) and “said to be the same as” “Autonomous Province of Kosovo and Metohija (https://www.wikidata.org/wiki/Q1255)”, which is more problematic, because it doesn’t say “said to be the same as by whom”.

Comment from PlaneMad on 21 November 2016 at 12:48

I’d ask the question “what problem are you trying to solve?”. If it’s not broken, don’t try to fix it.

The problem is the inconsistency caused by having multiple keys from OSM to another database, when it could easily be eliminated. Two wikipedia tags for a feature go against the best practice of what is documented in the wiki.

Kosovo is an interesting edge case, and thank you for bringing it up. There is no clear answer to that one, but solving the 99.9% trivial cases now is just good data gardening.

Comment from SomeoneElse on 21 November 2016 at 13:02

Alas, the real world has inconsistency in it, and we have to deal with that. Wikipedia isn’t “a database” it is “lots of different databases, with some coherence between them” OSM’s wiki is notoriously a haven for people who want to tell the real world how to behave rather than describe it, so sometimes you have to take what’s written there with a pinch of salt.

Sure, there will be examples where people have added multiple wikipedia keys by accident, but some may be deliberate, and the way to find out is to ask.

Blindly converting wikipedia entries to wikidata entries would have the same problem as adding wikidata without putting much thought into it (see https://www.openstreetmap.org/changeset/43749373 for a discussion of some of the issues that can occur there - a wikipedian with the best of intentions but no local knowedge missed things that would be obvious to a local).

Unfortunately, there have been so many “mechanical turk” additions of wikidata links to OSM recently by people unfamiliar with the things that they are linking that it’s effectively devalued the ones that existed previously. That’s sad (and is obviously a completely different issue to what you’re talking about here), but it does mean that “wikidata is not the (only) anwer”.

Comment from BushmanK on 21 November 2016 at 17:11

Linked Wikipedia articles in different languages are not always talking about the same entity, and it’s not just about those unrecognized or partially recognized regions. It means, that you can’t always be sure, that by switching language in Wikipedia, you will see an article about the same thing. Usually, you will, but it is never guaranteed. Sometimes, it will lead you to an article about a class of entities original entity belongs to (especially when certain terms in two or more languages have a different meaning).

So, wikipedia=lang:* syntax, even being “popular”, is just another example of an exclusive tagging scheme, that prevents several properties from being assigned to an object, even if they actually can coexist. Which is obviously bad.

Same thing, technically, applies to wikidata: if there is any real-world controversy about an entity it links to, it creates an incompleteness or inconsistency.

Comment from Vanuan on 21 November 2016 at 20:59

Same thing, technically, applies to wikidata

Same thing applies to OSM itself, right? For example, on the ground, you can’t distinguish between administrative entity and settlement. So there will always be a confusion. Unless you synchronize databases automatically. Wikidata makes a great job to disambiguate entities with a smallest degree of meaning.

if there is any real-world controversy about an entity

The idea behind wikidata is that there’s no controversy. There are subtle differences which are enough to establish different entities.

Comment from SomeoneElse on 21 November 2016 at 21:33

For example, on the ground, you can’t distinguish between administrative entity and settlement.

I think you can. To take an example that I used in this “talk” list post earlier today:

https://lists.openstreetmap.org/pipermail/talk/2016-November/077139.html

“Rossnowlagh” is a settlement. It’s got a hotel and a cafe, and lots of other settlementy stuff. “Rossnowlagh Upper” and “Rossnowlagh Lower” don’t have any of these - in fact when I was there there was nothing to indicate they exist at all on the ground.

Comment from BushmanK on 22 November 2016 at 00:24

@Vanuan, > Same thing applies to OSM itself, right?

Yes, but your example is not about that - it’s about being unable to distinguish two entities only in given circumstances.

But in OSM, there are things like shop=kiosk, while “kiosk” means completely different things in different countries. However, sharing the same problem doesn’t mean that we have to be more tolerant to that or to promote certain changes making it even worse.

The idea behind wikidata is that there’s no controversy.

Unfortunately, there is a gap between the idea and the real world situation. It relatively small, but it still exists.

Comment from escada on 22 November 2016 at 06:02

As someone who lives in a country with 3 official languages, I would prefer that you leave the wikipedia:xx tags as they are right now. It’s not because there is only a wikipedia:fr tag at the moment that the feature cannot be updated in the future to also have a wikipedia:nl tag.

Perhaps it would be better to retag all the wikipedia=xx:… to wikipedia:xx=… ? :-)

Anyway, this can easily be done while importing the data. So perhaps, people should create a github repository that contains all kinds of scripts that make the data more uniform in their opinion. The people that want to import data locally, can then choose which scripts they want to run.

Comment from Minh Nguyen on 23 November 2016 at 09:17

@escada, what’s the practical benefit of having wikipedia:fr and wikipedia:nl on the same feature? If the intent is to practice language neutrality, wikidata is the ultimate language-neutral tag. I don’t think mappers should feel any obligation to pair each name:* tag with a corresponding wikipedia:* tag.

For context, a great many of the original wikipedia:* tags were added with WIWOSM in mind. WIWOSM, which predates Wikidata, happily takes a wikipedia:de tag and displays a link to an English Wikipedia article that it discovered via an interwiki link. What wikipedia:*-consuming tool doesn’t do that? Would unlocalized Wikipedia links even be useful to an end user? (WIWOSM now recognizes wikipedia and wikidata tags as well, by the way.)

Personally, when I map places in Vietnam, I’m perfectly comfortable with seeing wikipedia=en:* tags – or even wikipedia=zh: tags in the disputed South China Sea – because I know the data consumers will localize as needed, and it takes only a couple clicks to verify that the tag is correct in terms of providing access to the local language’s Wikipedia article on the subject.

Comment from escada on 23 November 2016 at 09:28

@Minh Nguyen: Will I automatically end up with the Dutch version of the article when I click “a link” containing the French wikipedia article ? If so, there is no problem. It is a problem if I have to click another link to get to the Dutch wikipedia page.

I doubt everybody is comfortable that there is a direct link to the Dutch wikipedia and not to the French (or vice versa). I fear that it might start war edits among some groups of “language-fanatics”.

Comment from Minh Nguyen on 23 November 2016 at 16:02

Ideally, only mappers would ever see the original, unlocalized link in their editors or perhaps on osm.org, whereas people using OSM-based tools would always see a link to an article at the Wikipedia of their choice (based on a language preference and interwiki links). I suspect that’s already the case, but if there are tools that link users to Wikipedia without taking advantage of interwiki links, it would benefit everyone to have that bug fixed – not only speakers of the three official languages, but everyone else too.

I haven’t witnessed edit wars over languages on OSM, but I’d imagine that they’d be edit wars over name tags rather than Wikipedia tags, which are a bit more obscure. If things get too tense, it seems to me that the solution would be removing the Wikipedia tags in favor of a language-neutral wikidata tag, which many data consumers can handle just as well. Wikidata has the benefit of being more immune to article title edit wars that actually do happen at Wikipedia (Gdansk/Danzig, anyone?).

Comment from d1g on 5 December 2016 at 04:53

Glad to hear that everyone seem to agree that we need reference to Wikipedia/Wikidata in OpenStreetMap.

Dear all, please, use wikidata, not wikipedia and not wikipedia:xx.

@tbicr > Maybe better move wikipedia tags to wikidata? Yes

@escada > Perhaps it would be better to retag all the wikipedia=xx:… to wikipedia:xx=… ? :-)

name=* or name:ru=* or name:uk=* debate in Urkaine was horribly long, wikipedia:xx tags repeat mistakes of name:xx.

The best approach is not to rely on current languages in area of mapping and not to rely on users presence/local language in wikipedia=*.

There at least one objective reason for wikidata=* over wikipedia=*:

Wikidata IDs are more stable (they are not going to change it title of Wikipedia article will be changed from “A” to “A (in C)” or “A (geographical name)”

If you don’t trust me in that, please listen to @pigsonthewing: https://forum.openstreetmap.org/viewtopic.php?pid=619459#p619459

Comment from d1g on 5 December 2016 at 05:06

The problem is the inconsistency caused by having multiple keys from OSM to another database, when it could easily be eliminated.

Absolutely agree with @PlaneMad, we shouldn’t have 2 keys at the same time:

  • wikipedia and wikidata
  • wikipedia:xx and wikidata
  • wikipedia:xx and wikipedia

It is simpler to update single wikidata tag, not multiple tags with unclear (undocumented) preference between each other.

Furthermore, it creates complexity (what key to trust/read/update first) when a single tag wikidata can be sufficient.

Comment from Reino Baptista on 18 February 2017 at 18:42

Greetings,

Please don’t forget to consider OpenStreetMap Wiki. Someone already did ask this question and left us a good approach in OSM Wiki. I think it addresses all questions. I would like to ask your attention to item 2. on reference.

Key:wikipedia

## Secondary languages

  1. In almost all cases, a single wikipedia tag as described above is sufficient. Data users can access articles in other languages where available using Wikipedia’s interlanguage links. If interlanguage links are missing, this should usually be fixed within wikipedia.

  2. One example where it is appropriate to provide additional explicit links to articles in secondary languages is where the subject is included in an article on a broader subject in the secondary language, for example to wikipedia:en=List of museums in Paris to the English article which provides the best article for the particular museum in France, or wikipedia:fr=Monuments et sites de Paris which is the best article in French for a particular church in London. In another example the structure of subjects in articles cannot be matched 1:1 with interlanguage links (or maybe there are several articles for the same object). In these circumstances use the format wikipedia:lang=page title for the secondary languages.

RB

Comment from Mateusz Konieczny on 1 June 2021 at 16:23

wikipedia:lang tags are not needed, wikipedia tag is sufficient

I have some code for automatic cleanup that was used in Poland to get rid of such duplicated tags.

Why? it was making harder to maintain links - fixing it required sometimes changing several tags instead of one, it was confusing newbies, it was blocking other tags. In some silly cases people added several links in various languages.

In Ukraine I suspect that this could be done to avoid fights which wikipedia language should be linked.

And wikipedia tags should be kept as wikidata is an alphanumeric soup not readable for human, to distinguish likely correct tag from vandalism one needs to visit Wikidata or make an API call.

Also, I suspect that https://matkoniecz.github.io/OSM-wikipedia-tag-validator-reports/ may be interesting for people caring about such tagging.

And https://matkoniecz.github.io/OSM-wikipedia-tag-validator-reports/%D0%A3%D0%BA%D1%80%D0%B0%D1%97%D0%BD%D0%B0%20(Ukraine)%20-%20obvious.html#wikipedia%20tag%20in%20an%20outdated%20form%20for%20removal listing cases where wikipedia:lang tag can be automatically removed

It is simpler to update single wikidata tag, not multiple tags with unclear (undocumented) preference between each other.

Which part is supposedly undocumented?

Comment from Mateusz Konieczny on 1 June 2021 at 16:24

Dear all, please, use wikidata, not wikipedia

Both have its place. wikidata can be useful in case of badly made move of page (without leaving redirect).

wikipedia is human readable unlike wikidata tag and definitely should be present.

Comment from Mateusz Konieczny on 1 June 2021 at 16:31

“listing cases where wikipedia:lang tag can be automatically removed” - obviously, only with agreement of a local community

Should this just be a mechanical edit? Or should we have a worldwide campaign of community members checking each and every one?

If local community wants - it could be possible to switch to wikipedia where there is no conflict between wikipedia/wikipedia:lang/wikidata tags and investigate manually remaining cases

Log in to leave a comment