Maximum number of hours spent editing OSM on any day by a single user

Figure 1: The maximum number of hours spent editing OSM in a single day by any user, depending on the total number of days they have ever mapped.

A curious question was raised at SotM this past weekend in the discussion following a talk on developing an Automated approach to identifying corporate editing activity. In that work, Veniamin Veselovsky has found an ingenious method of time-shifting a mapper’s editing pattern to determine a remote-mappers local timezone. Then, doing this for many mappers, he is able to determine specific temporal signatures that have proven helpful in training models to identify paid editors in OSM. Temporal signatures are very powerful in OSM analysis, I used them to characterize editing in North America and found that Amazon’s temporal mapping signature is off-sync from local US mappers because they are primarily mapping during business hours in SE Asia.

The question posed after this talk, however, was not about corporate editing temporal patterns. Instead, the conference-goer was curious if temporal mapping signatures could be used to identify unhealthy mapping behavior. This had never occurred to me before, what would constitute unhealthy mapping behavior? The question then became, are there mappers that spend too many hours mapping a day? My curiosity was piqued.

Here I will explain one approach to calculating these values—or at least a decent proxy for them—and then I will share more figures and interpretations below.

First, how do we quantify the number of hours that a user may spend editing on a given day?

One approach is to convert a number of edits into an “editing session” as done by Geiger and Halfaker in 2013 on Wikipedia and then ported to OSM by Jacob Thiebault-Spieker in similar related research. This is a fairly complex conversion to obtain the number of “volunteer hours” that have gone into crowd-sourced projects, but it has proved fruitful in the past, so I wanted to make sure to mention it.

For this analysis, however, I took a different, albeit simpler approach that does not so robustly correlate to volunteer time, but instead to a general estimate of how many hours a day a mapper might be engaging with OSM. I do this by simply counting the number of distinct hours that a user submits changesets during on any given day. For example, if a mapper submits 5 changesets on a given day at 10:52am, 11:05am, 11:15am, 12:05pm, and 12:35pm, then they have submitted changesets during the hours: {10,11,12} = 3 distinct hours. This first query calculates these values for every mapper for everyday they have mapped:

with hours_by_user as (
  SELECT  uid,
          date(created_at) as _day,
          count(distinct(hour(created_at))) as _hours
  FROM changesets
  GROUP BY uid, date(created_at)

Now that we have counts per mapper, per day, we need to aggregate these into per-mapper counts. Let’s consider both the maximum number of hours that a mapper has mapped on any given day as well as their average number of hours over all of the days they have mapped.

Going back to our previous example of changesets submitted at [10:52 am, 11:05am, 11:15am, 12:05pm, 12:35pm] = {10,11,12}, we notice that it is quite exaggerated to consider this as 3 hours of mapping activity. This is really closer to just 2 hours of mapping activity, assuming the user was logged in from 10:52am through 12:35pm.

Additionally, our query currently credits any mapper submitting a single changeset on any day as 1 hour of mapping activity. To adjust for this, we will subtract 1 hour from all of the counts. A mapper submitting only 1 changeset or only changesets within the same hour will be counted as 0 hours of mapping activity, while a mapper submitting changesets in any 2 distinct hours will be counted as 1 hour. Over all of the changesets, these errors should converge into something slightly more accurate.

We also count the number of days that a mapper has ever mapped because this will become our independent variable to create different classes of mappers. Since the majority of mappers in OSM have only ever submitted 1 changeset (and therefore mapped 1 day), we need an independent variable that can appropriately distinguish between more active and less active mappers, so we will choose days, similar to the threshholds for active contributors.

A safe assumption is that mappers who have mapped more (more days) will have higher max hours and likely higher average hours values.

max_hour_per_user AS (
  SELECT user,
         max(_hours) - 1 as max_hours, 
         cast( avg(_hours) as int) - 1 as mean_hours, --avoid false sense of sub-hour precision with int
         count(distinct(_day)) as num_days
  FROM hours_by_user
  GROUP BY user

Finally, we can aggregate all of these mapping stats by our independent variable, the number of days that a mapper has been active.

SELECT num_days,
       array_agg(mean_hours) as mean_hours_array,
       array_agg(max_hours) as max_hours_array,
       count(user) as num_users
FROM max_hour_per_user
GROUP BY num_days

To reiterate, this query is calculating the number of distinct hours in which an editor submitted a changeset within consecutive (not rolling) 24 hour periods, i.e., UTC days. For this work, I think it is a decent proxy for the number of hours that a mapper might be active in OSM, but it does not account for the exact number of minutes that a mapper spends sitting at their computer. Though, I do think it is fairly close, especially for the more active mappers. On a related note, recent discussion on the OSM-talk list has also brought up the idea of measuring and crediting volunteer hours in OSM via Rovas.

So how many hours do mappers spend mapping each day?

Histogram of hours spent mapping each day Figure 2. Histogram of hours (max/mean) that mappers spend editing on any given day (Note the log scale)

Figure 2 shows the general distribution of hours per mapper per day. The majority of mappers (>1M) have not spent 1 hour editing OSM. We knew this, but it’s always good to confirm. As we get into the 8-12 hours per day range, the mean and max really begin to drift apart. For example, about 10,000 mappers have edited at least 8 hours in 1 day. Less than 1,000 mappers, however, average 8 hours of mapping for everyday that they map.

After 14 hours per day, the bars get a little less predictable. I think we are now viewing bot activity. There are handful of bot accounts that average 22-24 hours of editing per day. Not that they are active everyday, but when they are, these bots run long editing jobs, submitting changesets during every hour of a day.

Maximum & Mean hours of editing based on how many days a mapper has been active Figure 3. Maximum & Mean hours of editing based on how many days a mapper has been active (for mappers active more than 7 days)

Figure 3 shows the breakdown for the 200,000 mappers that have been active on 7 days or more. The mean, (shown on the right in orange) remains around 0-1 hours for nearly all mappers, but does creep up for the more active mappers. The maximum hours, however, considerably increases as mappers have more days of mapping experience. Notice that all of the tails go up to at least 17 hours, meaning that at least 1 mapper in every bin has mapped for at least 17 hours in a given day.

At first, this number seems too high. Is it possible to map for 17-18 hours in a 24 hour period? I looked into a few of the mappers that landed in these high-intensity categories in 2021 and found that yes, it certainly happens. One mapper I looked at joined OSM one afternoon and mapped until 9am the next morning, submitting changesets every 20-30 minutes for 18 hours. They took a break for 6 hours then resumed mapping from 3pm to 1am the next morning (another 10 hours). I can also say with certainty that this was a volunteer contributor, and I have time-zone adjusted these figures based on the location of the changesets that were indicated as local.

This of course, raised more questions. How many mappers have similar mapping streaks in which they continue to submit changesets on consecutive hours?

Editing Streaks

Figure 4. Editing streaks in OSM. More than 10,000 mappers have have submitted OSM changesets for at least 5 consecutive hours.

Some editing streaks extend more than 50 hours, but we have to assume these are bots (the majority of the accounts says as much). I calculated these mapping/editing streaks based on the time between consecutive changesets, at hourly granularity. For each user, I then calculated their longest streak as in the number of consecutive hours that they edited OSM. Unlike the previous charts, these are not based on distinct 24 hour periods. We can change the time unit to days, instead, and we find the number of mappers that have mapped for in a row:

Figure 5. Editing streaks (in days) in OSM. Hundreds of mappers have mapped on exactly 10 days in a row, and 9,614 users have mapped for at least 10 days in a row.

Returning to hours mapping per day (non-consecutive), here is another visualization of the mean hours spent by mappers. I would have expected this to be closer to 0-1 hours per day, but it does seem to consistently want to be above 1 for mappers that been active more than 250 days in OSM, even up to 3. This means that frequently recurring mappers do not just log on and make a few changes, they commit to regular editing sessions in which they are active for across multiple hours.

Figure 7. The median number of hours mappers spend editing OSM.

It’s been said before, but it’s worth reiterating: OSM is a vibrant community of mappers. There are many types of contributors, and while the majority of mappers might only ever make a single edit to the map, there are plenty of ardent, dedicated editors that map like it’s their job—and for some of these mappers it might be, but definitely not all.

Location: Last Chance Gulch, Helena, Lewis and Clark County, Montana, 59624, United States

Comment from Mateusz Konieczny on 24 November 2021 at 08:11

they commit to regular editing sessions in which they are active for across multiple hours

I would expect StreetComplete mappers to end here often, despite low number of edits and low impact.

For example I likely register as very long mapping session nearly every time I leave home.

(I am also likely ending there due to my habit of making small edits in rotation with other things, despite that this edits are tiny tagfliddling tasks like where I remove descriptive name - equivalent of landuse=garages name=garages)

Login to leave a comment