I only recently realised that HOT contributors need to mark at least one task as “done” to be listed as project contributor in the tasking manager. This made me wonder: how many people start contributing to a HOT project but never finish their first task? What proportion of all HOT edits are contributed in this manner?

Summary: about half of all HOT contributors never complete their first task on a project, although they do contribute to the map. These “partial” contributions account for 10-20% of all HOT edits.

Here’s a timeline of the number of monthly HOT contributors, compared with the number of those who completed at least one task:

HOT contributors with completed tasks

And here the corresponding timeline of the number of edits contributed by both groups of people:

HOT contributions and completed tasks

Expressed as percentages:

Share of completed work

We don’t know why these contributors never completed the task, we can speculate but really we would need to ask them. Some may have forgotten to close it after they were done, some may not have had the confidence to mark it as “complete” and wanted someone else to have a second look, some may have gotten distracted, or lost motivation, etc.

It’s also worth bearing in mind that we can always expect some proportion of tasks to be abandoned early: not everyone is interested in contributing to HOT in the long term. Many people are likely simply curious and try it out for a bit. Many may have come across HOT because a friend sent them a link, or because it was in the news, and we can’t expect all of them to stick around.

However we should also be mindful of these early experiences. On one hand we can improve our understanding of what makes people stop early. On the other hand we should also consider the impact these contributions have on our map, and on validation and QA efforts. Where should we send absolute newcomers the next time we’re in the news?

Some background info on the analysis…

I’m identifying HOT contributions in the OSM edit history as follows:

  • The contribution needs to fall within the geographic boundaries of a HOT project
  • The contribution needs to happen within the activity period of the HOT project
  • And then…
    • EITHER the user is a listed project contributor (they marked at least one task as done),
    • OR the changeset is tagged with a valid HOT project ID (the contributor never marked a task as done, but likely did start a task in the tasking manager before contributing edits.)

There are some caveats with this data:

  • In this analysis, one completed task by a contributor is enough to regard all their contributions to the same project to be marked as “done”. The simple heuristics above do not allow me to distinguish task completion states for all individual changesets of a contributor to a project.
  • We can’t distinguish contributors who never mark a task as “done” from validators, or expert contributors who manually tag changesets with a project ID. We don’t have the data to distinguish these cases, e.g. there is no published list of validators to compare against.
  • We can only reliably track this from Aug 2014 when iD started carrying over project-specific changeset tags from the tasking manager. We won’t be able to identify “unsubmitted” contributions before then.

By Martin Dittus (@dekstop) in 2015.

Comment from Alan Bragg on 24 August 2015 at 20:55

I’ll bet you have a lot of new contributors who clicked on “done” because they were done, not because the task was done. I was one of them. Now I rarely mark a task is done because it would take too much time to scroll through the entire area at a high zoom level.

Comment from Pattersonavh on 25 August 2015 at 12:48

The first time I began to work on a task on a HOT project, I was not the first on the site, and kept on coming across other entries the quality of which did impress me - unsquared buildings - roads or rivers obviously tracked at at a zoom level much lower than I was using for buildings etc. Being new, I didn’t feel justified in challenging these, since I wasn’t sure of the quality of my own work. (Still not absolutely sure, since only two of some 40 “done” tasks have been validated so far).

Now, I don’t necessarily mark as done a task that I feel to be completed, since I like to go around the adjacent tasks areas to pick up roads or features which cross the boundary. Therefore conscious that there are some task which I have left for a number of weeks, but need to come back to complete.

Comment from dekstop on 25 August 2015 at 12:59

Thanks both for your comments! It’s interesting to hear the many stories behind this, here and on the HOT mailing list. In some cases a lot of consideration goes into the decision not to mark a task as done… and sometimes people simply run out of time before they can finish.

Here’s a particularly detailed comment by Jarmo Kivekäs on the HOT list:

Comment from rayKiddy on 25 August 2015 at 20:03

What kind of tools did you use to pull this data out? I am a HOT mapper, and one of those who has never marked a tile as “Done”. There is information about the tile as a whole that just does not seem to be visible and I can never seem to have the confidence to so mark a tile.

I am becoming familiar with accessing OSM data via, for example, the overpy library in python. I think that one could get meta-information from a tile and that would help mappers know whether they could or should mark it is “Done”.

BTW, I am also familiar with R and other tools for data analysis. I would be curious about what you use. Thanx.

Comment from dekstop on 25 August 2015 at 20:24

Hallo rayKiddy! I’m getting the data from the OpenStreetMap edit history [1], that’s a ~50GB compressed XML or PBF file… it takes a substantial amount of work to get data out, but since that’s part of my job [2] I have plenty of experience with that and do it on a regular basis :)

The short answer: custom Osmium parsers, a PostgreSQL database, and an import process that matches OSM edits with HOT projects based on information taken from the tasking manager.

I intend to write more about my process in a future post, but it’ll take a while to get there. Partially the problem is that there are no general-purpose tools for it, so it involves a range amount of different technologies, and the process changes depending on your needs…

Login to leave a comment