OpenStreetMap

How to Use OSM Channel Data for Effective Communications

Posted by courtiney on 11 July 2023 in English. Last updated on 24 September 2023.

Last month, at SotMUS in Richmond, Virginia, I, along with Marjan Van de Kauter and Keara Dennehy, presented on “How to Use OSM Channel Data for Effective Communications”

Background:

The genesis of the project comes from Marjan Van de Kauter’s and my work piloting an OSM community engagement program for TomTom. To make sure we were communicating about TT’s organised editing correctly, we began tracking and organizing communications channels. As the list grew, we realized we needed a better tool, so we worked with a TomTom developer to build a webscraper that could show us in which channels the community was active.

Later, we brought Keara on board as a business analyst who could build a more robust tool to manage all of the data. By this time, we had realized that this information was something that the community could use at the global, regional and local level.

Then, when I left TomTom, but kept volunteering for the CWG and the OSM/F board on fundraising and communications, we saw additional applications for the data. So, we decided to create a proof of concept for a communication channel data store and present our first efforts and findings at the 2023 SOTM US in Richmond.

The Context:

As background, Marjan and I shared some of the results from the Communications Survey we conducted in May. I wrote about it here. Some of the findings were skewed, but we identified some interesting trends, including:

  • Some respondents reported that they felt the forums have a hostile tone (35%)
  • Many respondents said they were able to keep up with the conversations, both locally (60%) and globally (49%). Nearly 70% said that they got at least one useful response if they posted a question
  • Respondents were more likely to read than post: 379 said they read daily or weekly and 152 said they posted daily or weekly
  • Older respondents were more likely to use the Listservs or Community Forum, whereas younger respondents were more likely to use Discord or Reddit

Although the channels are seen as sometimes hostile and often noisy, and adoption of the various platforms varies widely, people are able to get the information they need. It speaks to the shared purpose of the community.

The Channel Data Store:

The methodology for creating the channel data store was roughly as follows: Keara and the other members of the TomTom analysis team used the forum API to scrape the community forums and a webscraper for the mailing lists. The team also used the Azure language detection tool to add language information to the data returned from the data scraping process. User information was anonymized, and message content removed, before the data was stored in the team’s data lake. Visuals were created in PowerBI , a closed-source tool used by the data team at TomTom. The proof of concept for the data store was based on data from January 2022 to May 2023 and contained the following:

Channels

  • 60 community forums
  • 217 mailing lists
  • 86,177 messages

Posters

  • 3,039 in community forums
  • 1,698 in mailing lists
  • 76 languages (automatically detected)

Of those 86,177 messages, 56,356 were from European sources. These results were not surprising, because editing volume is higher in Europe than the rest of the world, and the European communities tend to favor the mailing lists and community forums. We’d expect to see more volume from Africa, Asia, Oceania, and Latin America in Telegram and other channels. From this data, we extracted a few interesting trends:

  • More than half of the messages posted in the forums and listservs are in languages other than English
  • English is more often used for global topics, such as “Tagging” and “Foundation”
  • Individual channels trend toward a single language, not multiple languages
  • Adoption of the Community Forums is mixed
  • Channel activity is driven by a few frequent posters.

Conclusion:

Our Proof of Concept raises a lot of interesting questions that we would like to pursue. Some of them include:

  • Are these frequent posters carrying a burden of disseminating the knowledge across OSM?
  • What is the best way to post about a topic that needs to be seen by the entire global community?
  • What are the effects of the increased use of new channels such as Telegram and Matrix?
  • How does the quality and availability of language localization affect access to posting and knowledge?
  • How does the quality and availability of language localization limit participation and knowledge sharing from some regions more than others?
  • How can we reduce channel noise for better all-community decision-making?
  • What could we learn if we could measure impressions, including liking and saving activity (which we can’t do in the listservs)?
  • How can we use this data to support fundraising and OSM messaging?
  • How can we use this data to support team work and inclusivity in OSM collaborations?

Next Steps:

We have prioritized two next steps for this project:

  • We are looking into developing an open-source version of the communication channel data store to share with the community, so any member can use it to analyze communications in OSM and make data-backed communications choices. We are also interested in adding data from other community channel types. If you’d like to get involved, please reach out to Marjan.

  • We are also looking for help analyzing user trends that can support best practices for communicating cross-culturally on distributed teams, including creating a data-backed OSM communications guide. If you are interested in getting involved with this work, please reach out to Courtney.

We’re also happy to hear any other questions or suggestions you may have about this project and potential applications of the data.

–Courtney

Marjan Van de Kauter

Keara Dennehy

Discussion

Comment from Fizzie41 on 23 July 2023 at 23:03

Question for you, please, Courtney.

When you said: “Posters 3,039 in community forums 1,698 in mailing lists Channel activity is driven by a few frequent posters.” are people who have posted either in multiple channels, or frequently in one channel, counted once only here or multiple times?

e.g. I post in the Forum, various mailing lists & on Discord, so if I’ve posted 150 times across all of them, 50 in each, does that leave 3038 / 1697 other posters, or 2989 (2939?) “forums” & 1648 lists?

Comment from courtiney on 24 July 2023 at 17:11

Hello,

I doublechecked this with Keara Dennehy, our data expert, and this is what she said:

“These counts are referring to the number of unique users within that platform. Users will be counted twice across platforms if they use both (e.g. if you added the 3039 + 1698, you would no longer have a distinct list). Users are only counted once within the platforms, no matter how many times they post. So, to answer this commenter’s question at the bottom, this would leave 3038 and 1697.”

Comment from Fizzie41 on 24 July 2023 at 22:03

Thanks!

& when the wiki says: 19th January 2023 – 10,000,000 registered users, then it shows that “hardly anybody” uses any of the comms channels :-(

Especially when you consider that “most” people will post on both, so 3039 & 1697 really probably only = 2000 individuals :-(

Comment from courtiney on 25 July 2023 at 16:21

Hi, again,

My understanding is that the 10,000,000 registered users is over the entire life of the project, not a “current number of users,” so it’s not relevant to this particular data set as we only studied the communications between January 2022 and May 2023.

Keep in mind that a “user” in the context of this study is “someone who posts.” Our survey (and general data on how people communicate online) tells us that many, many more people read who don’t post. We can’t measure ‘engagement’ like that on the old mailing lists, but we could measure it on the Discourse forum.

It sounds about right that we have probably 2000 active users who post in the forums and old mailing lists, but keep in mind, we didn’t study Telegram and Discord, which have many hundreds of users and tens, if not hundreds of thousands of messages, as well. These two channels are used more often in Africa, SEA, South Asia, and Oceania

It also doesn’t measure US Slack, which has 5000 users, I believe.

There is so much more to study - this is a massive data set with all kinds of information in it, I am really hoping to get a grant or other means of studying it with a bigger team and more powerful computational tools.

Comment from SomeoneElse on 9 August 2023 at 18:33

Just as a bit of context, according to a local database of changesets, between January 2022 and May 2023 342163 users made a changeset with at least one change in it.

Of those, 173514 were making their first changeset in iD, and 118034 their 2nd changeset in iD. 2 people were making their 100,000th changeset in iD in that period.

Over the life of the project (ish, because changesets weren’t around from day 1 and older non-redacted data was made into “pretend changesets”), 2011005 users made a changeset with at least one change in it.

Log in to leave a comment