OpenStreetMap

In 2018, researchers Daniel Bégin, Rodolphe Devillers, and Stéphane Roche published a paper titled, The life cycle of contributors in collaborative online communities - the case of OpenStreetMap. A key takeaway from this paper was this density plot of a contributor’s first and last edit:

Contributor Lifecycles from Bégin et al.

Plotted this way, we see temporal trends emerge as vertical or horizontal lines describing when many users started or stopped mapping (vertical or horizontal lines). The paper also published this table to describe the events in OSM history that were being captured:

Table 2 from Bégin et al.

At the time of publication, the authors used data from mid-2005 through mid-2014.

Adding New Data

I find this density plot to be one of the best visualizations of OSM contributor patterns, so I recently remade the figure with data through 2021. In this post, I will share the new figures and the code I used to generate them.

First, I used the OSM public dataset on Amazon Athena to query the OSM changeset history (registry.opendata.aws/osm/). What once involved downloading and parsing >100M changesets can now be reduced to a 5-line SQL query:

SELECT uid,
       min(date(created_at)) as _first,
       max(date(created_at)) as _latest
FROM   changesets
GROUP BY uid

Next, using Pandas and Matplotlib, we read in the CSV and create the following plot:

import pandas as pd; import seaborn as sns
import matplotlib.pyplot as plt

#Read in CSV from Athena
df = pd.read_csv('~/Downloads/05b0fce8-8318-4c9e-b658-a8677cbed877.csv', parse_dates=['_first','_latest'])

#Create plot
fig, ax = plt.subplots(1, figsize=(15,15))
df.plot.scatter(x='_first',y='_latest',s=0.1,color='k',alpha=0.2, ax=ax)

#Add Labels
ax.set_title("OSM Contributor Lifespans (Remake of Bégin et al. 2018)\n({:,} mappers)".format(len(df)), fontsize=20)
ax.set_ylabel("Latest Edit", fontsize=18); ax.set_xlabel("First Edit", fontsize=18);

For all of OSM:

Remake of Bégin et al. 2018 with all data

We see the same features highlighted in the 2018 paper (so we know it worked!), but also many new vertical lines. Most notably in mid-2016, the density of the plot increases considerably. Recall that each of these dots represents a single mapper. This denser upper corner represents users who made their first edit in 2016 or after. Looking at when mappers made their first edit, we can see that in 2016, the average number of daily new mappers in OSM jumped from about 300 in 2015 to nearly 550 in 2016:

New Mappers in OSM Each Day

What caused this spike?

Distinguishing which software was used for each of these first-edits, we can see that this spike was due to the launch of editing within Maps.ME:

Software used by new mappers >*Showing 95% of first edits to OSM with most popular mapping tools; remaining 5% were made with > 750 other software libraries. The query:

WITH mappers AS (
	 SELECT uid,
		min(id) as _first_changeset,
		min(date(created_at)) as _first
	FROM changesets GROUP BY uid
)
SELECT mappers.uid, _first, split(tags['created_by'],' ')[1] as _editor
FROM mappers LEFT JOIN changesets ON mappers._first_changeset = changesets.id

Recent Years

Given the density of the plot in recent years, we can discern more if we focus only on mappers starting since 2015: Remake of Bégin et al. 2018 zoomed in

A few observations:
  1. The thick diagonal line at y=x shows that for most mappers, the first and last days of editing are very close if not the same day. This could be from attending a mapathon once, for example.
  2. The diagonal stripes indicate that for some mappers, their last day of editing is exactly 1 year after their first day of editing.
  3. The darker horizontal line at the top of the plot shows the thousands of mappers that started in the last 7 years and continue to be active.
  4. The vertical lines represent specific days when many new mappers started, such as the vertical line appearing in early-mid 2015 describing mappers that likely started mapping in response to the April 25, 2015 Nepal Earthquake.

Incorporating Color

While the diagonal stripes in the previous scatterplot show mappers whose first and last editing days were 1 year apart, we do not know how many days they may have been mapping in between those two dates. If we add count(distinct(date(created_at))) to our query, we can use this mapping_days attribute to color the dots:

Since 2015 with color

If these mappers along the various diagonal lines were active for the much of the year, we would expect their dots to appear pink to orange, instead, we see the majority of the dots forming these diagonal lines to be purple, meaning that these mappers were only active a few days within their first year of mapping, but they did return on the one-year anniversary of their first edit to make their last edit.

Another View - Humanitarian Mapping

As a whole, this density plot exhibits interesting patterns, but subsetting it further highlights other distinct behaviors. For example, if we look at only the 236k mappers who included the text #hotosm in the comment of their first OSM changeset (perhaps implying that they were introduced to OSM via humanitarian mapping), we see a different pattern:

HOT

One thing to note are the many groups of dots in November. This is likely the effect of mappers joining during an OSM geo-week event at some point and then contributing again (for the last time) at another OSM geo-week in November of a later year. We should also note the orange and yellow dots at the top of the plot, showing the many mappers that started mapping in OSM via a HOT-task and have continued to map consistently since.

These density plots offer a convenient, interpretable visualization of hundreds of thousands of OSM contributors. This conversation on the OpenStreetMap US slack prompted me to recreate these figures (and finally solve a longstanding question about the bump in new mappers since 2016). What also came out of this thread was an interest in visualizing the daily mapping activity to see if new density patterns might emerge.

Daily Mapping Activity

The previous density plots use one dot to represent one mapper. If we focus instead only on a subset of top contributors, say mappers that have mapped for more than 100 days since 2018, we can dig a little deeper into their temporal patterns. In the following figures, each dot represents 1 mapper mapping on 1 day. Each row, then, represents a single mapper.

To find which mappers were active on which days, we use the following query:

SELECT uid,
   date(changesets.created_at) as _day,
   sum(num_changes) as _edits,
FROM changesets
WHERE changesets.created_at > date '2018-01-01'
GROUP BY uid, date(changesets.created_at)
ORDER BY uid DESC, _day DESC

Rug / Quilt plot of mapping activity

This plot is sort of interesting, highlighting a few light spots around the holiday when even the most ardent mappers are less active. We see many very active mappers picking up activity / joining in 2021. What if we subset this data one more level?

Daily Mapping Activity with Paid Editors

If we expand our criteria to include only mappers active for more than 50 days since 2018, we find 23k mappers (23k rows) where the mappers at the very bottom were active for up to 1,385 days (nearly everyday), which continually decreases as you go up, to mappers in the top rows who were active for at least 50 days since 2018. I have highlighted known paid-editors in orange on this plot (known because they disclose their affiliation in their OSM profile). Notice the heavy concentration of paid editors between 300 and 750 days, especially after mid-2018 (700s) and mid-2019 (500s), and early 2020 (400s). For reference, there are about 250 working days in a given calendar year. Someone mapping consistently on working days since mid-2018 would have mapped more than 750 days by late 2021. Likewise, someone mapping consistently during the work week since mid-2019 would have mapped more than 500 days by late 2021. It is subtle, but I think this pattern is discernible in the graph:

Rug / Quilt plot of mapping activity with paid editors highlighted

Conclusion

These density plots to quickly visualize thousands of OSM contributors and their daily editing patterns. The lifecycle plots show platform-wide trends such as many mappers starting or stopping while the daily mapping plots elucidate nuanced temporal patterns of continuous editing behaviors. Visualizing all of OSM is always a tedious task, but finding ways to subset the data (say by hashtag or known paid-editors) adds new dimensions to these plots.

Leave a comment with any questions or other visualizations you’d like to see and I will try to post more examples.

Cheers! Jennings

Location: Grant, Salem, Marion County, Oregon, 97301, United States

Comment from SimonPoole on 14 November 2021 at 10:59

There hasn’t been a long standing question about mapper bump in 2016 except for people that willfully ignore what is going on outside of their bubble, see https://www.openstreetmap.org/user/SimonPoole/diary/43093

Comment from PierZen on 16 November 2021 at 01:23

Hi Jennings, in your diary gray boxes, I cannot see tables and graphs and no links to data.

About prior discussion, Note that there a lot more then the study you refer to. Other then academic publications, you should look at the OSM community where there have been many discussions and analysis about contributors lifespan and contributors profile.

Like Simon, I illustrated the Maps.me bump effect in 2016 and presented lifecycle plots. Look at my diary OSM Contributors Outlook - The Pulse of OpenStreetMap Contributors. The first graph shows survival curve of yearly cohorts for the first 12 months. For each yearly cohort, we first observe a dramatic decrease after the first month (often after the first day of contribution) and the constant decrease in the following months.

You will also find analysis of contributors vs contribution share for various OSM contributors profile. For example, graph2 shows a constant increase of «Discover» contributors (1-2 days of participation) with a minimal share of contributions (ie. objects edited).

My publications on slideshare about major OSM humanitarian responses also include various analysis of contributors profile where we see the problematic with mapathons of massive number of contributors that participate only once. Their weight in number of objects edited is a lot smaller. More difficult is to analyse the quality of their contribution and how efficient are teams in such mapathons to integrate newcomers and assure quality.

Comment from Zaneo on 21 November 2021 at 13:26

Can you check how many days passed after receiving a comment on a change set and the last mapping day?

Comment from Mateusz Konieczny on 24 November 2021 at 08:16

How this would look like if only more productive mappers would be shown?

Excluding people who made less than say 25 edits.

Maybe also excluding accounts which are blocked and/or number of edits is low compared to total block length (number of edits smaller than blocked days * 10). But that is likely not impacting much.

And maybe also plotting blocks would be interesting.


Login to leave a comment