After my talk at State of the Map in Brussels, Nick Allen asked: are newcomers to HOT more likely to be retained if we give them positive validation feedback? And conversely, do we discourage them if we invalidate their work? I had no answer at the time, in part because many validation interactions are not public. However, I agreed with his observation that these are likely important early encounters, and that we should make an effort to understand them better. In particular, we should be able to provide basic guidance to validators, based on empirical observations of past outcomes. What are the elements of impactful feedback?
I spoke to Tyler Radford about these concerns that same day, and within a few days we signed an agreement which gives me permission to look at the data, provided I do not share any personal information. The full write-up of the resulting research is now going through peer review, and I will share it when that's done. In the meantime, I thought I should publish some preliminary findings.
Manually labelling 1,300 messages...
I spent the next months diving into the data, reviewing 1,300 validation messages that have been sent to first-time mappers. I labelled the content of each message using models from motivational psychology, and feedback in education settings. For now I'll skip a detailed discussion of the details, but feel free to ask questions in the comments.
I assessed the impact of different kinds of newcomer feedback:
- Positive performance feedback: messages including comments like "good job", "great work", "looks good", ...
- Negative performance feedback: "doesn’t look complete", "missing tags", "needs improvement", ...
- Corrective feedback: guidance about specific improvements to improve future work, including links to documentation.
- Verbal rewards: messages containing positive performance feedback, gratitude ("thanks!"), or encouragement ("keep mapping").
Here's a chart of the frequency of each type of feedback across the messages I labelled:
To measure the effect of these feedback types, I collected the contributions for each newcomer over a 45-day period after their initial edit, and labelled the content of the first feedback message they received during this time. I then observed for how many days they remained active, or whether they dropped out (as measured with an additional 45-day period of inactivity). I then used a Cox proportional hazards model to explain the retention rates we observed, based on a set of features and control variables. This is comparable to a regression analysis, but specifically intended to model participant "survival". In the context of this study, the term `hazard' is a synonym for the risk of abandoning HOT participation. A hazards model yields a hazard rate (or rate of risk) for each contributing factor, denoting the relative increase in hazard when a particular feature is present. For example, a hazard rate of 2.0 means that the person is twice as likely to stop contributing within the observation period, compared to the average. Conversely, a low hazard rate of 0.5 means they are twice as likely to still remain active at the end of the observation period.
Social affirmation matters: someone else cares
Maybe most importantly, I found that the feedback can be an important source of social affirmation, which in turn can improve newcomer retention. This effect is most clear among newcomers who contributed comparatively less on their first day (mapping less that the median of 75 minutes), possibly because they have low intrinsic motivation or self-efficacy. Among these, people who received verbal rewards in their first feedback message were significantly more likely to keep mapping, at a reduction of the hazard rate to 80%. In comparison, newcomers who already start with a high degree of engagement may not require such affective-supportive feedback to remain engaged.
This makes sense when you consider the wider context. The process of contributing to HOT online can be considered a depersonalised form of interaction: it is often focused on the task, rather than the learner. In the absence of other prominent social cues, small phrases of support may have a large effect. In the case of validation feedback, it's likely also important that this is not simply an automated message. Instead, someone else looked over your work and then took the effort to write some kind words.
To my surprise, negative performance feedback in itself is not necessarily discouraging to newcomers: while it may demotivate some individuals, in aggregate across all newcomers there was no significant effect on retention. This includes instances of invalidated tasks, and negative performance feedback such as "your buildings are all untagged". This may be because the feedback is private: people don't have to be concerned about the impact on their reputation, and can focus on improving their skills. In communities like Wikipedia where feedback tends to be public (in the form of comments or reversions), it was found that negative feedback can harm newcomer retention. It's also worth mentioning that even "negative" feedback in HOT still tends to be polite and constructive: HOT validators are generally a very polite bunch, based on the messages I've seen. They might simply point out that you forgot to square your buildings.
The timing of feedback matters: feedback that is sent a week after a contribution is significantly less likely to still have a motivational impact. In comparison, feedback that is sent within 28 hours or less (the median delay) yielded a reduction of the hazard rate to 80%. Any additional day of delay increased the hazard rate.
I now believe that this places validators at the core of the HOT community: for many contributors who can't attend a mapathon, and who haven't subscribed to the mailing list or joined IRC, validation feedback is their first experience of a social encounter. For a number of reasons, the current iteration of the tasking manager doesn't easily support such interactions (maybe a topic for a future post); but I'm looking forward to the next iteration, which is already in planning. As I've learned through discussions, the validator community already has some great ideas about improving it even further.
The fine print
First off, this is an observational study, which comes with some constraints: we can identify links between validation styles and outcomes, and control for confounding factors through careful model design, which gives us some confidence in the findings. However, we would have to run actual experiments to confirm each link.
The models behind these findings account for a number of confounding factors. For example, I consider each newcomer's initial contribution activity: were they already enthusiastic contributors to begin with? I also look at the particular project they start with: did they join during a disaster campaign, possibly in a wave of public interest? Such newcomers tend to not stick around for long.
And my usual caveat applies: I assessed the impact on contributor activity and retention, but not on contribution quality. In part because I still haven't found a good approach to assessing contribution quality at this scale: there is no ground truth available for comparisons, and contribution practices are diverse and often specific to the geographic/thematic context. Developing methods to assess data quality at this scale is a research project in its own right.
This is certainly not the final word on validation feedback, and I expect many others will add to this (maybe in the comments)? But it can hopefully serve as one contribution to our growing body of knowledge about how best to support our maturing community.