Talking of the OSM Contributors, we often see the Big Numbers. In this Diary, my objective is to focus on the OSM Contributor profiles, to try to measure the impact of various groups on the OSM Edit Contributions.
Since 2005, there has been an explosive growth of new OSM Registered members from 500,000 in 2012 to 1 million in 2013 and 4.2 millions at the end of september 2017.
Pascal Neis and Alexander Zipf study in 2012 showed that only 38% of the registered members at the end of 2011 had started editing the database and that only 5% (24,000) of all members actively contributed to the project in a more productive way.
Activityworkshop.ne published in july 2013 an interesting analysis of contributors «Joining and leaving» as participants. It shows the volatily of OSM contributors with a high volume of contributors starting and stopping contribution shortly after. As we will see below, a high percentage of people that start to contribute stop the first day or after a short period.
There are also various studies that show the contribution inequality with most of the data produced by a minority (see Anran Yang, Hongchao Fan, Alexander Zipf, 2016 and Ding Ma, Mats Sandberg and Bin Jiang, 2015). Statistics and analysis presented by Pascal Neis and Simon Poole over the last years did also show various aspects of the contributions, with the concentration of Contributions by a minority and the volatility of contributors participation.
The OSM Changesets Dump File contains metadata about each changeset edition. Like others, our analysis comes from this file. If we dig in and analyze the OSM changesets database, this shows that for the 13 years from 2004 to end of september 2017, 953,200 contributors edited at least one object and 108,800 edited more then 1,000 objects (ie. node, ways or relations). This is an indication that there are massive inflows of new participants that contribute minimally. The analysis below will confirm this hypothesis.
The Pulse of OpenStreetMap Contributors
Cohort analysis let’s break a dataset into related groups that share common characteristics or experiences. For OSM, we can group contributors by the year they started to contribute and compare the various cohorts to see patterns of contribution.
The graph 1 reveals what I call the «Pulse of OpenStreetMap Contributors». Rodolphe Quiedeville OSMPulse website did also illustrate the beat of contributions. While his real-time graphs (Last update in 2014) did focus on the number of objects edited minutely, we focus on the contributors with the same year of experience. For each calendar year, this is like if we did organize a marathon will all the contributors aligned on the same start line, looking at their progression month by month. We could also follow them for even longer periods and compare their long term behavior.
Graph 1 Note that this graph do not show a long timeserie. These are individual graphs for each yearly cohort. For each year, we follow for 12 months the new contributors that start editing.
Here for each calendar year, we follow OSM new contributors and participation from their month 1 to month 12 of contribution (it does not matter if one started in january, feb. etc). With such cohort analysis, we can see all the new entries for the year. This reveals what I call the Pulse of «Discovery contributors» with the great majority that do not participate more then 1 month. The high rate of departure at month 1 confirms the volatility of contributors participation. It shows what is called the lower tail of Contribution with a high number of Contributors with a minimal impact on the OSM edits. There is a lot more to say from such analysis and I will come back in an other Diary with more profile analysis from the cohort trajectory statistics.
Simon Poole published in his OSM Diary various examples that show the variability of inflows of new contributors and how it is not always related to significative edits. The sudden increase of contributors in early 2016 that we can observe on graph 2 below comes from Maps.Me editors where many of them did map personal infos. At the end of 2016, thousand of faked accounts were created in USA by SEO companies. OSMstat for 2017-09-30 shows also indications of various profiles with 5,413 active contributors and 3,301 with node edits > 15.
Monthly Statistics – Let’s color with Contributors Profiles
Lets’ now add Days profiles to the monthly statistics and help better see the heterogeneity between the Contributors and the Contributions. Pascal Neis OSMstat website and Simon Poole stats on the OSM wiki OSM wiki let us observe monthly statistics of contributions. We observe since the beginning of 2016 an average of 25,000 to 50,000 active contributors per month.
Graph 2 combines the monthly statistics of Contributors (ie. have edited in the month) and Contributions (ie. number of objects edited node, way or relation) from the OSM stats wiki. We color these charts with the Contributors profiles based on cumulated days of participation since the first edit to OSM (Pascal Neis classification). The comparison of the two graphs let’s observe the concentration of Contributors in the first two classes and the concentration of the Contribution in the last class.
Profiles of Contributors and Contributions by month for 2017 up to september on Graph 3 let’s measure the respective percentage of each class based the cumulatives days of contribution. For 2017-08, the first two classes «Discover 1-2 days» (19,186 contributors) and «Rarely Active 3-14 days» (12,845 contributors) represent 66% of the share of Contributors. In comparison, their share of Contributions (13%) is relatively minimal. The «Discover» class with 3.5% of Contributors corresponds more or less to the «Pulse» we observe on the cohort analysis.
The other tail of distribution is represented by the 4,000 contributors that are part of the «Mega Active» class (271 days and more). They represent 8% of Contributors and 37.6% of Contributions.
Pascal Neis Contributions of the yearly cohorts graph on his 2016 yearly Statistic Blog, shows the respective importance of each yearly cohort on the level of monthly Contributors. In this case, the cohorts are not aligned from month 1 but colors let's see stratas of contributors by the year they started to edit. With this representation, the peak of the yearly new contributors is less acute, being spread in the month they started editing. The top of Graph 4 reproduces Pascal chart. Every year, we observe the jump in the number of contributors, and their relative importance that reduce gradually in the next years. Again, we see the rise of Contributors from 2016.
The Contributions Profile at the bottom of the Graph (ie.Percentage of Contributions by months) reveals that the first year of participation, the yearly cohort of new contributors represents nearly 40% of contributions, that share reducing in the following years. With the rise of Contributors in 2016 and 2017, we observe also a rise in the share of Contributions. Simon has measured that the rise of Maps.Me Contributors had a minimal impact on the share of Contributions. More analysis will be necessary to explain which categories are responsible of this jump.
I hope that this different angle on the Contributors data will hep to better understand the various contributions to OSM. Do not hesitate to comment. And I plan to continue such analysis in other Diaries.