dekstop's Diary

Recent diary entries

Design proposal for a HOT quality assurance support tool

Posted by dekstop on 20 August 2015 in English.

If you’re subscribed to the HOT mailing list you’ve seen a recent invitation to help develop a funding application for the Knight Prototype Fund, coordinated by Russ and Blake. The intention was to discuss project proposals that may be suitable for this grant. The initial IRC meeting then developed into a larger conversation around current HOT needs for better tools: the resulting Google Doc with meeting notes lists six project ideas.

The strongest candidate was a proposal to develop a HOT/OSM tool to support Quality Assurance (QA). You can read some details in the grant proposal writeup, however it’s a fairly high-level text. Informed by our discussion I also developed a draft specification, with a more detailed list of considerations and potential features.

I’m posting this draft specification here to get your feedback, and to hopefully stimulate some debate about what a good QA support tool might look like. The proposal is a result of conversations with HOT practitioners, and based on my own use of HOT and OSM data. However there are likely many community members with further ideas, and some may even have worked on HOT QA initiatives. We would love to hear from you! In particular we would love to hear from validators, and from existing users of HOT data. What specific data quality concerns arise in practice?

(I should also state that I don’t have a deep understanding of the Humanitarian Data Model – there are likely some useful concepts in there that could be more emphasised in the spec.)

Considerations

Our general ambition is to make HOT progress more visible. More specifically, the proposal aims to support our existing QA processes around HOT validation. Crucially it further aspires to provide a means of demonstrating HOT data quality to prospective users of the maps.

Aims of the proposed QA support tool:

Impact analysis of HOT coordination efforts: to describe our outputs in ways that are meaningful to the HOT community, to prospective data users, and to a wider public.
Evaluating fitness for specific purposes: to assess the quality of the data in relation to the specific concerns of data users.
Integration support: to assess the structure of the data in relation to the Humanitarian Data Model (HDM).

Target audiences

The design of the QA support tool should be informed by the needs of existing users of HOT data: most importantly HOT activation partners, and requesting organisations with specific information needs. This also includes prospective users in aid organisations who still need to be convinced that the data can be useful.

It should also be informed by the needs and experiences of HOT validators: they are most well-informed about HOT data quality concerns, and they are likely going to be the most active users. The QA support tool should integrate well with HOT validator workflows, however it is not meant as a replacement for existing tools. I imagine its most useful function will be as a final check: a summary report of the outcomes of a particular mapping initiative.

The design could further consider the needs of other potential users of HOT data: people who want to report on current issues, or who as part of their work can make use of geospatial data. This includes local communities, local and international journalists, engaged citizens, and supporters of aid organisations.

What are their needs?

(This is a bit speculative. Please share your thoughts on this.)

“Which data sets are available?” Which regions are covered? What kind of information is captured?

“What is the quality of the data?” An assessment of map completeness (coverage), consistency (e.g. of annotations), and various measures of accuracy. An assessment of the age of the data, and of its provenance: which imagery sources were used to produce these maps?

“How can we access the data?”

“How can we integrate it with our information systems?” For example, how well does it map to the Humanitarian Data Model, or other standard data models?

The QA process: tests and reports

I. Basic report (derived from OSM edit history):

How much data is there?
How many people contributed?
How old is the data?

II. Coordination report (derived from edit history and TM2 data):

HOT project identifiers: links to the projects that produced this data
Have contributions been reviewed (validated)? where? what changes were made?

III. Automated QA (basic validation):

Untagged objects
Overlapping objects
…

IV. Annotations report: which annotations are available?

Geospatial information: road names, place names, …
Data provenance: description of imagery source
Data management: review-related annotations (e.g. ‘typhoon:reviewed’)

V. Humanitarian data report (derived from OSM edit history, HDM):

What map object types have been mapped? how many objects are there?
E.g. “150 buildings, 15 hospitals, 3 helipads”

VI. “Fitness for purpose” reports: assessing the availability and completeness of data in relation to specific needs:

Availability of building data needed for population density models
Availability of road data for transport planning
Availability of infrastructure data (hospitals, schools, helipads, …) for aid coordination and logistics
Others

Other considerations

Should a QA support tool also include its own workflows to address specific issues, or focus on descriptive reports as outlined here? Will our existing validator workflows remain sufficient as we grow?

Who should be doing QA work? How much of QA requires “expert” knowledge? Can we consider QA a general community activity that’s open to all? E.g. by using guided workflows with good documentation. (This is also a discussion about HOT validation practices.)

HOT validation as prerequisite for community growth?

Posted by dekstop on 6 August 2015 in English. Last updated on 9 August 2015.

In a response to an acute shortage of validators, Missing Maps in London are now training people up at their monthly events: first to learn JOSM, then validation. I think that’s great! It’s particularly fitting that validators are trained from the same volunteer pool as new HOT contributors. That way, at least in principle, their numbers can grow together. While currently validation is often on the shoulders of a few expert insiders, in this new model it instead can become an important training aspect for larger numbers of highly engaged HOT contributors. Becoming a validator could be an important rite of passage for certain new contributors.

It’s good that validation in particular is now being taken so seriously. It enacts an important process by which the OSM community can manage the growing flood of incoming contributions. Unchecked floods of contributions are likely harmful in the long run: to the quality of the map, and to OSM maintainer morale. As Elinor Ostrom demonstrated: such shared limited resources need to be managed and defended by its beneficiaries.

I think validation also fulfils an important social role for newcomers: it can provide encouraging feedback and useful training experiences, and in the beginning these exchanges may be quite impactful. Maybe most importantly, validation provides a rare opportunity for a contextual social encounter. It’s a perfect opportunity to catch first-time contributors for a chat, without requiring them to subscribe to a mailing list or join IRC first. It’s a form of socialisation: the teaching of techniques and community norms.

Any such validation process however must acknowledge the scale of the challenge: that most HOT contributors likely ever only contribute little, and may never return. The first encounter should be brief yet impactful, and most importantly it must be repeatable at large scale. Thousands, and maybe soon millions of times.

It must also be noted that validation is not (yet) a well-defined practice, and instead often depends on the interests and skills of the individual validator, and on how they were trained – just like any other OSM contribution. Everyone does it a little differently. I liked Lisa Marie Owen’s recent diary entry on her global validation procedure, with good discussions in the comments; there are likely many other examples.

I wonder if there’s also an opportunity to create validator networks: online or offline places where validating users can meet with other domain experts, where norms can be clarified and negotiated, and where contributors can hang out and bond. IRC channels, Facebook groups, mailing lists, meetups, … whatever may be practical. Maybe there’s already an existing space in the OSM or HOT universe where we could send new validators?

Unknown Pleasures

Posted by dekstop on 1 July 2015 in English.

(Sneak preview of new data visualisation work. While playing with HOT-related time series data I realised that Harold D. Craft’s classic visualisation technique was a very suitable means of illustrating certain temporal patterns…)

Initial activity and retention of first-time HOT contributors

Posted by dekstop on 22 June 2015 in English. Last updated on 6 July 2015.

(Hallo! I’m Martin Dittus, a PhD student at UCL. You can read more about my research in an earlier post.)

The volunteers of the Humanitarian OpenStreetMap Team (HOT) and its affiliated projects have spent many thousands of labour hours on the creation of new maps for humanitarian purposes. Yet mapping all the undocumented and crisis-stricken regions of the world is a formidable task. The 2014 response to the Ebola epidemic illustrated this well: even after months of work by thousands of volunteers, the new maps of Central and West Africa are still nowhere near complete.

Many people within HOT now believe that this can best be addressed by growing the community by a few orders of magnitude. An MSF article about Missing Maps articulates this ambition:

To reach our goal, we need the Missing Maps Project to be the biggest instance of digital volunteerism the world has ever seen.

So let’s say we’d want to grow HOT to a million volunteer contributors. How can we train new contributors at that scale? What are our barriers to entry? How can we retain contributors once they’ve had first experiences? Etc… many open questions.

As a first step let’s learn from existing experience. How does engagement compare across the different mapping initiatives right now? Let’s start with a simple comparative study.

Comparing three large HOT initiatives

I’m particularly interested in the engagement profile of first-time contributors: people who may have OSM experience, but who have never before contributed to HOT. How much work do they provide in the first couple of days? How long do they stick around?

In this post I’ll compare the first-time contributor engagement profiles of three initiatives. Each has a different purpose, and a different mode of organisation:

Typhoon Haiyan (TH) in Nov 2013: A high-profile and urgent initiative. A first “CNN moment” which brought many newcomers to HOT. Accompanied by a larger number of one-off mapathons around the world.
Ebola Response (ER) throughout 2014: A high-profile, multi-month sustained effort. A large amount of media coverage. Coincided with an initial wave of monthly mapathons in several cities.
Missing Maps (MM) from Nov 2014 onwards: A larger initiative across a range of humanitarian causes. Proactive, low in urgency, with less media attention: the focus is on community-building. Monthly mapathons, heavy use of social media for promotion.

I’m using the OSM edit history as the basis for my analysis, focusing on an 18-month period from from 16th of June 2013 to 15th of December 2014. During this time, 1,582 first-time contributors joined HOT to participate in one of these three initiatives, joining one of about 100 projects. (There were many thousands more contributing, but for now we’re just interested in first-timers.)

Here’s a timeline of when these contributors first joined, with a bubble for each new contributor: Timeline of first-time contributors Each new contributor is visualised with a bubble. Bubble sizes represent the amount of labour hours the person contributed in the first 2 days. Contributors are ordered vertically by their OSM ID: older user accounts at the top, new accounts at the bottom.

For each of these contributors we’ll build an engagement profile. For the purpose of this analysis I’m using quantitative measures of engagement, these are easy for me to produce across a wide range of projects:

Short-term activity: labour hours, contribution rate in the first two days.
Short-term retention: the share of contributors who remain active in HOT on day 2.
Long-term retention: the share of contributors who remain active in HOT in month 2 and 3.

Findings: initial activity and retention of first-time contributors

When we model first-time contributor engagement in this way we can see some similarities across the three initiatives, but also some striking differences. I’ll discuss five key observations.

1. Baseline activity in the first 48h is surprisingly high! Many first-time contributors participate for multiple days in a row. The median contribution activity is ~70 mins in the first 48h. This may sound small for a typical volunteer organisation, but for an online project it’s massive! We further find that between the three initiatives, MM contributors map at the slowest pace. We’ll come back to that in a second. Median contribution activity in the first 48 hours: labour hours (left) and contribution rate (right, in edits per hour).

2. Prior experience affects performance. More experienced users tended to contribute faster and work for more hours, and come back the next day. This effect can be observed globally, and for each of the project groups we observed. This either suggests that there is a training effect for OSM users which is transferrable to HOT, or a self-selection bias: contributors who enjoy mapping may simply be more engaged in general, be it in HOT or other OSM activities. Distribution of initial activity by prior OSM experience: the amount of labour hours l48h (left) and the rate of contributions c48h (right). In each plot, contributors are segmented by their degree of prior OSM experience. Median values are marked with a red line.

3. MM contributors tend to be OSM newcomers. How much experience does a typical first-time HOT contributor have? It turns out that this can vary wildly based on the initiative. The TH and ER groups have a mix of both OSM experts and OSM newcomers, whereas by far the most first-time MM contributors have virtually no prior OSM experience. Share of participants with a given amount of prior OSM experience, measured in the number of days on which they contributed to OSM.

4. These newbies are catching up quickly. Contributors to MM start slowly, however they catch up with others: many increase their pace of contributions in the first 48h. Compared to that, TH and ER contributors tend to maintain their initial pace. Share of participants based on their change in contribution pace between the first and second day.

5. Project purposes or modes of organisation likely have an impact on contributor retention. How many contributors to each of the initiatives are retained as HOT contributors? This is maybe the most important aspect if we care about growing an active volunteer community. For each first-time contributor we determine if they return on the second day, and whether they remain active contributors to any HOT project during the second and third month after their initial contribution. Comparing HOT initiatives in this manner uncovers some remarkable differences in retention.

Contributors to TH engaged in much short-term activity in the first few days, however in the longer term none of the contributors remained active! In comparison, about 8% ER contributors are retained as HOT contributors in the second month, and 1% in the third: they slowly fade away. In contrast to this MM has the lowest short-term retention, yet the highest long-term retention: contributors do not tend to come back on the second day, however they are more likely to remain active a month or two later. A remarkable accomplishment. Median retention for day 2, and months 2 and 3.

Implications

I would argue that the HOT community is highly engaged already. Most volunteers contribute for more than an hour within the first two days of their initial contribution, and a significant percentage of contributors is retained for longer periods.

The data suggests that the capacity-building strategies of ER and MM initiatives work particularly well: in these two initiatives, a good share of contributors kept coming back. No doubt this is because both were longer-term initiatives, so first-time contributors may have felt a responsibility to keep contributing. However I suspect there may be additional reasons. Maybe most importantly, monthly mapathons in a growing number of cities provide welcoming social spaces with expert guidance, peer learning, and all kinds of enjoyable experiences. In addition to that MM appears to foster a more well-connected community, with the means of notifying interested contributors of new causes via Facebook, Twitter, email alerts, …

I believe that given a choice, newcomers are best placed in projects where they have a higher likelihood of being retained. In our case this would be the ER and particularly MM initiatives: projects that are specifically set up as long-term initiatives. Additionally there are indications that particularly MM was successful at retaining and training absolute newcomers with no prior OSM experience.

Another key observation is that as HOT grows and starts new initiatives we’re gradually reaching outside the existing OSM community. Most first-time contributors now have no prior OSM experience, this was quite different in the beginning. This certainly affects how we should approach and support HOT newcomers.

By Martin Dittus (@dekstop) in 2015. This was produced as part of my academic research together with Licia Capra and Giovanni Quattrone. A paper which includes this work and more is now under review.

A researcher's scrapbook: understanding contributor engagement in humanitarian mapping

Posted by dekstop on 7 June 2015 in English. Last updated on 1 February 2017.

Hallo! My name is Martin Dittus, and I’m a PhD student at the ICRI Cities at University College London. I research community engagement in the Humanitarian OpenStreetMap Team (HOT), a volunteer initiative with thousands of contributors. At its core this is quantitative work, and my main outputs are statistics and data visualisations. I also spend a lot of time with the HOT community, am a contributor myself, and have spent much of the last decade with a range of similar community organisations.

I like that my job allows me to combine my experience in large-scale data analysis with my personal interest in community organisations. I spend a lot of time exploring data sets, producing things like this:

A big part of my work is about developing means to reason about HOT as a social phenomenon. I make use of “hard” evidence of data sources like the OSM edit history, but also the “soft” evidence of knowing the practices and motivations of the community. Together they allow me to develop conceptual models that help us reason about HOT. (I strongly believe you need both.)

In conversations with other community members and HOT organisers I realised that a lot of the data explorations I produce can be of interest to a wider audience. In early May 2015 I gave a talk at the HOT Summit under the title “Contributor Engagement in Humanitarian Mapping”. The feedback I received was overwhelmingly positive, and there was quite a lot of debate afterwards. Then Alyssa Wright approached me and strongly suggested to find more public forms of sharing my findings. I shall aspire to do so! I can’t promise a regular schedule, but I’m keen to share my observations.

In particular, I noticed that many people have a lot of experience about how to make HOT work, but also that people’s perspectives tend to be local: they are focused on particular aspects or initiatives. This is in the nature of the practice, which is highly distributed across dozens of interest groups and concerns. As a result few people have a good global overview of what HOT is and how it works. In addition to these local experiences I think there is also an opportunity to develop a broader understanding of HOT, and I think I can contribute to that.

This may take different forms:

Analytics and visualisations that highlight key contribution patterns.
Contextualising the data: what do the numbers mean?
Conceptual models, for example reasoning about coordination tactics.
Evidence to substantiate design choices: currently, HOT planning decisions are often intuitive rather than evidence-based.
Critical thinking about community and coordination activity.

You can follow me at @dekstop where I will announce any future posts. I will also highlight key projects on my personal website.

Location: East Marylebone, Fitzrovia, Camden Town, City of Westminster, Greater London, England, W1T 3PP, United Kingdom