The Goal

While working through some edits in Indonesia I noticed object with the key “nama”. A quick search revealed that this in Indonesian for “name” and the objects can very likely be modified to use the standard English. I wondered, how common is this? Is it easy enough to track down?

The Plan

As a test run I picked 4 tags: name, building, source, type. These show up in TagInfo in abundance and I’m sure there are lots of other good candidates.

Next step is to get usable translations. It turns out Google Sheets has a GOOGLETRANSLATE function that takes a word and will return translations into various languages. I pulled in the two letter language code list and built my sheet. After eliminating all languages that Google Translate didn’t support and all languages with non Latin characters I was left with ~80 languages to check.

The last step was to pull usage information. Fortunately for me TagInfo has an exceptionally well documented REST API. Fifty lines of C# later and I had my results.

The Results

Clicking through a few of these in TagInfo reveals some more likely candidates for cleanup.

name 92528930
nome 166
Name 7
Nom 62
Nome 31
non 6
név 1
nama 133
nombre 207
building 537924316
bangunan 4
Bangunan 1
budynek 2
source 242170152
bron 13
fonte 16
Source 66914
fuente 57
kaynak 8
type 10603620
tip 382
tipo 375
typ 65
Typ 6
genus 902468
tipas 2
Type 283
tur 8

Comment from n76 on 22 November 2022 at 23:29

Sounds like you have a found some good things to clean up.

I see you have “genus” as a translation of “type” but you should be aware that “genus” is a valid tag for tagging plants so you may want to refine that one a bit more. Maybe if you find a genus=* tag without natural=* and/or other associated tags like species=* they could be candidates.

Comment from watmildon on 23 November 2022 at 03:26

Oh absolutely. I was curious about “genus” and it’s only in the list because Latin was one of the “languages” that happened to survive my sorting. I highly doubt anyone is actually accidentally submitting Latin into the database. Casting a wide net means you’ll very often find false positives!

Login to leave a comment