The Goal

While working through some edits in Indonesia I noticed object with the key “nama”. A quick search revealed that this in Indonesian for “name” and the objects can very likely be modified to use the standard English. I wondered, how common is this? Is it easy enough to track down?

The Plan

As a test run I picked 4 tags: name, building, source, type. These show up in TagInfo in abundance and I’m sure there are lots of other good candidates.

Next step is to get usable translations. It turns out Google Sheets has a GOOGLETRANSLATE function that takes a word and will return translations into various languages. I pulled in the two letter language code list and built my sheet. After eliminating all languages that Google Translate didn’t support and all languages with non Latin characters I was left with ~80 languages to check.

The last step was to pull usage information. Fortunately for me TagInfo has an exceptionally well documented REST API. Fifty lines of C# later and I had my results.

The Results

Clicking through a few of these in TagInfo reveals some more likely candidates for cleanup.

name 92528930
nome 166
Name 7
Nom 62
Nome 31
non 6
név 1
nama 133
nombre 207
building 537924316
bangunan 4
Bangunan 1
budynek 2
source 242170152
bron 13
fonte 16
Source 66914
fuente 57
kaynak 8
type 10603620
tip 382
tipo 375
typ 65
Typ 6
genus 902468
tipas 2
Type 283
tur 8

Comment from n76 on 22 November 2022 at 23:29

Sounds like you have a found some good things to clean up.

I see you have “genus” as a translation of “type” but you should be aware that “genus” is a valid tag for tagging plants so you may want to refine that one a bit more. Maybe if you find a genus=* tag without natural=* and/or other associated tags like species=* they could be candidates.

Comment from watmildon on 23 November 2022 at 03:26

Oh absolutely. I was curious about “genus” and it’s only in the list because Latin was one of the “languages” that happened to survive my sorting. I highly doubt anyone is actually accidentally submitting Latin into the database. Casting a wide net means you’ll very often find false positives!

Comment from Mateusz Konieczny on 26 May 2023 at 08:15

Note that (based on my own experience) it is easy to fall into trap of finding a lot of things to fix then not managing to fix even small part of that.

So I would encourage to balance fixing/finding things to fix.

(BTW, I have and )

Comment from Mateusz Konieczny on 26 May 2023 at 08:20

Oh, and it is also danger of being to edit happy and then damaging data more than improving it (or ending with case where others think it happened due to lacking or missing communication). This also happened recently to me and I am still fixing it.

See also

Comment from watmildon on 26 May 2023 at 20:56

Oh absolutely! Definitely meant as a demonstration of “there’s work out there that’s easy to go get at” than a recommendation to “go mass retag things”. As always, the tools are powerful but must be used cautiously.

We have an infinite sea of work, just need to find the right little inspirations for folks to go to it. (in a collaborative and cooperative manner of course!)

Login to leave a comment