OpenStreetMap logo OpenStreetMap

Since the recent (estimated: two days ago) update to version 1.7.1 (“Add basic browser and platform info to changeset tags (#2559, #2449)”) of our editor iD it publicly, permanently and silently logs operating system, browser and language details (+ more) for every user, for every edit by adding those tags to a changeset (example values follow; or see /history and pick a random one until you get one by iD):

browser = Chrome 37.0
locale = it-IT
platform = Linux
  • I could imagine good uses for this big pile of data … e.g. * it may help in debugging the editor * one could potentially make nice statistics of our user base in total from this data (from a dump), or * use it for quality assurance heuristics (e.g. it may be more suspicious if a foreign language user edits at a specific place),
  • But I also could imagine bad uses for this big pile of data: * it also enables everybody to create detailed statistics about a single user’s browser update habits and browser name * or operating system switching over time. Which all is not why people contribute to OSM. * Language, Browser name, exact version and operating system name may make a contributor identifiable among a big group of persons, especially if some of those details are not very usual (think of someone speaking Lithuanian using Epiphany under Linux and editing in an Argentinian city – the expectation of only contributing under a pseudonym user name is quickly broken. * Furthermore, the users are not even asked (and also not even notified) if they agree to permanently, publicly publish this private data. The iD editor just asks for the changeset comment. All other tags are added automatically and silently. This breaks the Privacy Policy (“All edits made to the map are recorded in the database with the user ID of the user making the change, and a timestamp at the time of change upload.”) if we assume that the users do not intentionally choose this editor and are aware that it does whatever things. Assuming that every contributor finds the small entry in the iD release notes at github is totally unreasonable. I only noticed the problem because I was viewing other contributors’ changesets. * The users have practically no chance to ever remove this information about them.

In the linked issues (found via the release comments) 2559 and 2449 I see no rationale at all why all this data needs to be saved 1. publicly, 2 permanently and 3. silently. Just reasons why the data could be useful are mentioned (similar to my ideas above) but not why the privacy and trust of our contributors needs to be hurt in this extent. Note: I have messaged the three involved developers/issue reporters via OSM mail about this post.

I think this recent change is really over the top and is doing harm, because to outsiders our project may seem as if it does not care about our contributors’ privacy and fools new users by silently publishing information about them. I would hate it if, in the future, I would need to pass along a big warning about privacy when I try to attract new contributors.

Of course a simple workaround is to use another editor, e.g. JOSM, which I suggest doing for other reasons anyway.

Please, let’s quickly remove this personal data canon before even more data is collected. By the way, I am intentionally not writing in a hidden bug tracker to make everybody aware of the problem and hopefully sensitise the developers a bit.

Update: on 16th May (15 days after writing this diary entry) iD’s main code was modified and browser (browser name), version (browser version), platform (operating system) were removed again. Still, the locale (user’s language setting) and host (the website at which iD is running at) are silently saved into the changeset tags. See https://github.com/openstreetmap/iD/pull/2643

Likely it will take some days until this new, partly fixed iD version appears on osm.org.

Discussion

Comment from ImreSamu on 1 May 2015 at 17:55

I was the original proposal .. so I am a little sad..

for me it is very good information for help translating, because now we have a lot of translation problems.

“>think of someone speaking Lithuanian”

see “id_presets_translation_duplicates_lt” ( 2015.febr.13 ) https://github.com/ImreSamu/ideditor_translation_test_reports/blob/master/qadata/lt/id_presets_translation_duplicates_lt.md

or italian duplicates: ( 2015.febr.13 ) https://github.com/ImreSamu/ideditor_translation_test_reports/blob/master/qadata/it/id_presets_translation_duplicates_it.md

“>Of course a simple workaround is to use another editor, e.g. JOSM, which I suggest “ “>doing for other reasons anyway.”

JOSM also save the locale settings, like : created_by=JOSM/1.5 (7724 hu)

hu = hungarian.

Comment from ImreSamu on 1 May 2015 at 18:08

and this is german iD Editor Duplicates

https://github.com/ImreSamu/ideditor_translation_test_reports/blob/master/qadata/de/id_presets_translation_duplicates_de.md ( 2015.febr.13 )

We need to effectively balance protecting privacy and data quality.

Comment from woodpeck on 1 May 2015 at 18:14

Related discussion on the JOSM tracker: http://josm.openstreetmap.de/ticket/8701 (two years old). The compromise for JOSM was to continue uploading language settings and version number to the changeset metadata, but not operating system.

Comment from aseerel4c26 on 1 May 2015 at 18:47

Thank you for your comments!

ImreSamu, yes, I know, JOSM also sends the language as changeset tag by default - but there is a difference: JOSM shows you the changeset tags (at least if you look at the other tabs in the changeset upload dialog), so the OSM contributor at least has a chance to be aware of it before uploading. The OSM contributor even could modify or completely remove this changeset tag. I guess a simple JOSM plugin even could easily switch this language tag off.

However, yes, I think the editor users should be made aware of those data or at least we need to update the privacy policy mentioning that out usual editors send their version and the used language as changeset tag.

Comment from aseerel4c26 on 1 May 2015 at 21:57

@ImreSamru: if you need it for “data quality”, don’t do it publicly, permanently and silently.

  1. E.g. set up an external logging server which can only be accessed by some iD devs. Is it needed to log it publicly along with the OSM data in our planet which really everyone can access?
  2. delete the data as soon as it is not needed any more
  3. ask the users or at least tell them what will be, why, how long logged.

Please ask yourself for each single point: why is hurting our contributor’s privacy needed? And, of course, if you think you need to do so: ask!

Still, I do not really understand why you need each single user’s language info regarding the translation “duplicates”. Okay, I now read your text in issue 2449 again – okay, yes, in fact for this use logging the data to the planet is most easy for you, but is it justified? Really not. Not how it is done now. You could ask your users to voluntarily disclose more info than really needed to edit OSM. However, rather fix the translations before you release them and do not underestimate the power of our wiki-like contributor system: e.g. tagging errors (due to a bad translation) will be found and corrected – without all those downsides.

Comment from ImreSamu on 2 May 2015 at 01:36

for me it is acceptable the JOSM compromise, with language code.

“>However, rather fix the translations before you release them”

:) And fix the mapping errors before release OSM maps…

Seriously, we don’t have resources to translate wiki documentation to hungarian languages. ( see [taginfo wiki statistics] (http://taginfo.openstreetmap.org/reports/languages) )

And as you see [iD Editor - translation status] (https://www.transifex.com/projects/p/id-editor/) , others have a same problems:

  • iD Editor - POLISH translations : 82%
  • iD Editor - LITHUANIAN : 81%
  • iD Editor - TURKISH : 39%
  • iD Editor - BULGARIAN : 34%
  • iD Editor - ESTONIAN: 28%
  • iD Editor - ROMANIAN: 12%

Some translation completeness is very low. But now I don’t know how many users use turkish language settings with iD.

And we don’t have Translations Quality Tools, to detect the problems ( like duplicated translations) so we need to create like this

”> and do not underestimate the power of our wiki-like contributor system: e.g. tagging errors (due to a bad translation) will be found and corrected – without all those downsides.”

The real problem for the small languages, that experienced mappers use JOSM, and the beginners use the iDEditor. But beginners can’t catch translation errors :( Only experienced mappers. We have found some big translation errors, and we can’t detect the impact.

And other problem is the HOT TASKS and the beginners. The data quality and a good communication is important in crisis times. As I see the NEPAL HOT statistics: “At least 2,498 of these mappers are new to OpenStreetMap.” They are motivated, but sometimes don’t speak english. And sometimes we can’t help them because we don’t know which language they understand.

And there is a new “Changeset Discussions”, but which language I have to choose to communicate to the other osm mappers? May I use the local languages? But I can’t read and write nepal language. But If the mappers JOSM user, then I can see the language code from the changeset. ( created_by =”JOSM/1.5 (8291 de)” )

Comment from Richard on 2 May 2015 at 10:49

I think you’re extending own values to other people here.

By definition, people who believe that any piece of identifying information is a “privacy breach” are more likely to use JOSM. That’s fine. That’s their choice.

For a different userbase this is really helpful information. The language means other users are more likely to be able to assist them via changeset comments, for example.

Please don’t assume that everyone shares your beliefs, especially given that you come from a country and culture known to be much more sensitive to these things than the average. For example, my home address is readily viewable by a whois lookup, and a picture of my front window has just been retweeted 40 times. ;)

Comment from aseerel4c26 on 2 May 2015 at 12:34

Thanks for your comment, Richard.

Yes, I know that there are different standards/believes. Everybody who wants it is free to add more tags to a changeset (e.g. his exact current location, mood, last food and room temperature). It may help in QA, at least someone will find all that data useful.

Yes, I know that the tendency differs among countries/cultures. However, we are explicitly offering to contribute pseudonymous and our privacy policy makes it quite clear that this in fact is the case. Please let’s adjust the privacy policy then to not fool some people who have other expectations. And, of course, I would need to warn new users then that privacy is a value which is not respected by default in our project. And even that privacy policies are not worth the bytes they are written with. :-(

Once we even had anonymous contributions. The privacy policy mentions this and explains that the new pseudonymous mode is not much worse (“because revealing the user ID of a user making a change, is not deemed to present a serious privacy problem (a user ID does not necessarily need to reveal a user’s real name or any other personal information”). That is wrong now, because much more info + dynamic context is disclosed.

In your words: Please don’t assume that everyone shares your beliefs, especially given that you come from a country and culture known to be much less sensitive to these things than some other countries/cultures.

We want to be a project for everyone/everywhere, right? There might be people for whom pseudonymity is important – e.g. (but not only) if contributing to OSM is forbidden by law (China).

Comment from aseerel4c26 on 2 May 2015 at 20:32

this just came to my mind: if a user’s language is that important for contacting the user, why don’t we make it a voluntary detail on the user profile (asked during registration and listed in the user’s settings page)? That is not really something we need to extract from the editor UI, which may not be the preferred contact language. For example:

Mapper since: August 14, 2005 Contributor terms: Accepted over 4 years ago Languages: en, fr

Comment from ImreSamu on 4 May 2015 at 09:57

for greater transparency - I have created this issue:

https://github.com/openstreetmap/iD/issues/2633

Comment from SomeoneElse on 6 May 2015 at 08:16

As someone who occasionally contacts mappers new to an area, I find JOSM’s language tags on the changeset really useful (e.g. when deciding what language to use when contacting them). Having the same information logged by iD will be similarly useful.

What you do on the Internet is essentially public (especially when you’re updating a public map!). Lots of browser information is logged by every other internet site out there, and glossing over that fact doesn’t “help privacy” in any way at all. According to https://panopticlick.eff.org the browser that I’m typing this into right now is unique among those that site has tested - there are real privacy concerns about what we do on the Internet, but storing the browser and language against iD edits in OSM isn’t one of them.

Comment from aseerel4c26 on 7 May 2015 at 23:22

@SomeoneElse: if some internet site (possibly illegally) logs my browser details + IP/user account(!) that is one thing, but logging the data in public is another thing. Logging the data in public while telling something else in a so-called privacy policy one more of another thing. Note: The website logs (if existing) of example.com are usually not. Also a user X does not frequently visit example.com because he wants to contribute there. Also user X does not trust example.com that much as he does trust osm.org.

The excuse that others are doing harm, is no excuse to do so too.

Yes, the language information seems to be the most useful info (and JOSM logs it too), so, could we please leave this aside from the discussion because we suppose that it is “okay” (whyever this is not mentioned in our privacy policy) to log it?

Log in to leave a comment