OpenStreetMap

Will the DWG block us all one day?

Posted by SunCobalt on 23 May 2018 in English (English)

Alt-Text

I have noticed a trend that user blocks are more and more "requested" for things that were accepted in past (i.e. not responding messages/empty changeset comments etc).

Although OSM claims to have 4,607,162 users [1] only 16,431 were editing during the past week [1]. On the other hand, I have counted roughly 1,800 user blocks [2]. I know that one user can receive multiple blocks. However, the proportion seems odd. Looking at the chart, I have two questions in mind. Altough the DWG creates regular reports, there is nothing about this in it.

  • Is it an erroneous assumption that the barriers for blocking users were lowered over the time? And is there a list like ignoring changeset comments is still above the level for being blocked but ignoring user messages isn't?

  • What is the reason for publishing the stigma of a received user block on his/her profile page as well as in the "list of shame" [2] over a period of more than 8 years now? Does the limitation of purpose by the GDPR for data stored apply?

Thanks a lot

[1] https://www.openstreetmap.org/stats/data_stats.html

[2] manually counted http://www.openstreetmap.org/user_blocks, adjusted by ~130 for the Berlin train station user.

Comment from RobJN on 23 May 2018 at 12:33

A different angle is: Look at how much time this is taking. Assuming 5 mins work per block that's 150 hours of work per year, and increasing. I'd guess that the real number is much higher.

Can we look at some other solutions that would prevent blocks ever having to be implemented. Preferably without simply shifting the time from DWG to a group of people. Perhaps being more supportive of people who want to make edits that are more controversial (e.g. helping people get good quality imports with simple documentation and useful tools). Perhaps automated changeset comments when no text added by the user. Perhaps a simpler wiki. Perhaps...

What are the main reasons for blocks this year?

Comment from escada on 23 May 2018 at 14:31

Perhaps SEO-spam ? A SEO company can make a lot of different accounts, one for each business they want to "add". See e.g. this discussion on the Australian mailing list

Those people might not be interested in learning how one properly add a business with advertising.

Comment from RicoElectrico on 23 May 2018 at 14:42

I have noticed a trend that user blocks are more and more "requested" for things that were accepted in past (i.e. not responding messages/empty changeset comments etc

I am fine with this. OSM is a social project, after all. Keep in mind these are probably 0-day (blocked until message is read) blocks.

What is the reason for publishing the stigma of a received user block on his/her profile page as well as in the "list of shame" [2] over a period of more than 8 years now?

The practice shows that unreasonable people tend to stay unreasonable. You can always search changesets or forum for discussion on said user.

I, for one, am fine with stricter DWG policy. To earn a block you have to make some mess, anyway.

There are some quite active mappers who feel discouraged to edit because of misbehaving users not being dealt with in timely manner.

Though, I have to agree that notifications should be made harder to miss e.g. by listing them on the website and not relying solely on e-mail.

Comment from Pieter Vander Vennet on 23 May 2018 at 15:37

DWG is doing a good job. I don't think they block if they don't have to; and newbies messing up often respond well on changeset comments.

Comment from SimonPoole on 23 May 2018 at 16:53

If you have been paying attention you will know that there was a large influx of directed editors recently that not only didn't respond to messages (and made bad edits), but created larger numbers of sock puppets that were in turn blocked (IMHO reason enough to never ever let anybody from the companies involved near OSM again, but I digress). The blocking of a fair number of SEO accounts last and this year has already been mentioned as an additional larger source of blocks.

The other point of note is that cumulative graphs always point upwards and scale can be used to show whatever you want. The other distortion is comparing to random, as small as possible, numbers. We have had a bit over 1'000'000 contributors to the map data and that is the correct number to compare with a cumulative number over the lifetime of the project. So we are actually talking about less than 0.2% ever receiving a block (and naturally the larger portion of these being harmless "read the message" blocks). If we want to compare annual numbers, last year we had roughly 310'000 active contributors vs < 1'000 blocks, so still a very small number.

Comment from SunCobalt on 23 May 2018 at 18:14

If you have been paying attention you will know that there was a large influx of directed editors recently that not only didn't respond to messages (and made bad edits), but created larger numbers of sock puppets that were in turn blocked (IMHO reason enough to never ever let anybody from the companies involved near OSM again, but I digress). The blocking of a fair number of SEO accounts last and this year has already been mentioned as an additional larger source of blocks.

I have heart something here and there but was unable to find out even the scale of the issues you mentioned. There was nothing in the DWG reports I could find. I have made an adjustment of a special situation that I was able to quantify.

The other point of note is that cumulative graphs always point upwards and scale can be used to show whatever you want. The other distortion is comparing to random, as small as possible, numbers. We have had a bit over 1'000'000 contributors to the map data and that is the correct number to compare with a cumulative number over the lifetime of the project.

Please apologise. It seems I could not tell you my point with the chart. Let me explain 3 things: 1. I'm not pointing to the increase of the user blocks year over year. As you mentioned the number always increase when you cumulated, I am pointing to the increase at a yearly increasing growth rate, a trend since years. It is called exponential growth.

  1. I am not sure what you mean with "other distortion is comparing to random, as small as possible, numbers". The chart start with the first user block in the system, not with a random number. If you need help getting the data, I can explain how I did it.

  2. The scale is plain vanilla linear and was not selected. Well, it was the default scale Excel provided me.

I agree on your last part how to set the blocks into proportion. I just wanted to point out that we have not endless mappers and 1 Mio might overstate the potential as well

So it still leave me with the questions if user blocks are given more relaxed and if it is necessary to publish a block for more than 8 years.

Comment from Dzertanoj on 24 May 2018 at 06:27

This is the case where presenting any kind of global statistics as an argument is fundamentally fallacious, regardless of cumulative or not, what scale was used for a chart and so on.

It is fallacious because there is no evidence that increasing number is a result of a tendency on the DWG side. There is no "too many" or "too few" - there are multiple individual cases that lead to as many blocks as needed (excluding the undiscovered ones). Although, there might be some local clusters of reasons why blocks have been issued. Such as spam, Pokemon Go vandalism (something totally new, isn't it?), use of illegal sources to improve the data for the commercial purpose, edits war over disputed territories. If there is an increase of vandalism related to any third-party services, it is actually an indirect proof that the number of data users grows. (Yes, it happens. No, I don't know how many of these cases led to a block or has been found.)

So, I suggest abstaining from any negative assumptions and presenting it as a false dichotomy of "getting more users" versus "blocking more users". If it is necessary to block someone to maintain data integrity and quality as well as project reputation - it's totally fine. By the way, data quality degradation caused by systematic vandalism is among the reasons why loyal users might become discouraged and lose their motivation.

If you are aware of any case when a user has been blocked without any significant reason - let everybody know about it. If you think that the practice of blocking users is not transparent enough - let everybody know about your concern. But saying something as vague as "oh, maybe it's too much", even in a context of growing number of blocks per unit of time, is, again, fallacious and counter-productive.

I really hope that there is no post-modernist ideology involved here, such as "any user, even one who has effectively and systematically demonstrated uncooperative and even hostile behavior together with harmful actions, can be transformed into a valuable community member". But if there is something like that behind this diary entry, I suggest presenting an accurate evidence of such possibility. Even several anecdotes could be sufficient since it makes no sense to expect a scientifically correct proof.

Comment from mikelmaron on 24 May 2018 at 17:48

Great fidelity in the stats, on an ongoing basis, would be interesting. Pivot by length of block, who applied the block, parse the text for some indication of the purpose of the block. Would be useful to spot patterns in problems the DWG is dealing with, without placing additional burden on them to report out their activities.

Comment from SimonPoole on 24 May 2018 at 21:56

@SunCobalt "small as possible" number using 16.5k weekly editors number as a comparison, I take back the "as possible" because you could have used the daily number to show that the DWG has blocked over the life time of the project nearly as many as EDIT DAILY!!!!

Number of blocks in absolute numbers increasing: big secret, OSM is growing and it is growing faster every year. It would be really weird if the number of blocks was not increasing (even without the blips that have already been pointed out). And there is no indication that the relative block rate is changing in any larger way.

Comment from SomeoneElse on 25 May 2018 at 11:10

First things first, obviously Betteridge's law of headlines applies here. :)

To be clear about one thing - the user blocks list is not a "list of shame". As described on the DWG's wiki page, blocks 'don't imply that users have done anything wrong, and often contain friendly language to try and communicate that fact. Usually before any block is applied (even a "0-hour message that has to be read") attempts will be made to contact the mapper, such as via changeset discussion comments'.

It's also meaningless to try and equate blocks (or OSM accounts) with actual human users. Where one user (or a group of users) has created multiple accounts to try and "make a particular edit" in spite of their being problems with it (real examples: change all the tracks in an area to roads so that their preferred Garmin map shows them; changing their school's name to something like "please don't give us any homework") then the number of blocks per user or even per OSM object might be very high; they don't mean that more OSM users are proportionately being blocked.

As Mikel is I think trying to suggest, it would be possible to obtain details of the actual incident from the block(s) associated with it (and "incident" will most often be "hello and welcome to OSM, some people are trying to help you"). Excluding data from deleted users, OSM changeset, object and block data is public, so any OSM user can do that (I'm assuming here that OSM's eventual GDPR implementation will somewhat resemble what Geofabrik and HDYC have already done).

Best Regards Andy (from the DWG, but writing in a personal capacity)

Comment from SunCobalt on 25 May 2018 at 11:16

As Mikel is I think trying to suggest, it would be possible to obtain details of the actual incident from the block(s) associated with it (and "incident" will most often be "hello and welcome to OSM, some people are trying to help you")

I am doing this investigation right now and will come up soon

o be clear about one thing - the user blocks list is not a "list of shame". As described on the DWG's wiki page, blocks 'don't imply that users have done anything wrong, and often contain friendly language to try and communicate that fact. Usually before any block is applied (even a "0-hour message that has to be read") attempts will be made to contact the mapper, such as via changeset discussion comments'.

I have never met someone seeing it this way, except from the OSMF or Working Group environment.

Comment from mikelmaron on 25 May 2018 at 14:11

it would be possible to obtain details of the actual incident from the block(s) associated with it

Something to consider. We could add a category field to blocks, that could capture the general type of issue(s) the block addresses. Would make it easier than trying to parse the text of the block

the user blocks list is not a "list of shame" I have never met someone seeing it this way, except from the OSMF or Working Group environment.

Perhaps simply adjusting the labels to differentiate between zero hour blocks, and more serious blocks, could help. The zero hour blocks could even be separated into a distinct list, with another label and description.

Comment from SunCobalt on 25 May 2018 at 14:28

Something to consider. We could add a category field to blocks, that could capture the general type of issue(s) the block addresses. Would make it easier than trying to parse the text of the block

I am not sure I can parse the text block. While I get all other items extracted, the free text is in different languages. See for example https://www.openstreetmap.org/user_blocks/1995

So it is probably much easier to add keywords in the text block or add category field to blocks

Comment from Mateusz Konieczny on 3 June 2018 at 06:28

"only 16,431 were editing during the past week [1]. On the other hand, I have counted roughly 1,800 user blocks [2]. "

This is an useless comparison - you seem to be unaware that vandals tend to create multiple accounts.

Some projects encountered trolls creating hundreds or thousands of accounts, some people vandalise projects for decades.

Without excluding sockpuppet accounts such comparison is utterly useless.

Comment from Mateusz Konieczny on 3 June 2018 at 06:30

BTW, on local meeting of Polish community one of problems mentioned was that obvious vandals would receive permanent blocks immediately.

Comment from Mateusz Konieczny on 3 June 2018 at 06:37

would -> should in my previous comment, sorry

Comment from gormo on 5 June 2018 at 20:20

I am not sure I can parse the text block. While I get all other items extracted, the free text is in different languages. See for example https://www.openstreetmap.org/user_blocks/1995

Is there a free tier of a google translate API? Maybe you could use that, but don't forget to make an Auftragsdatenverarbeitungsvereinbarung with Google first. All Hail the GDPR!

Login to leave a comment