OpenStreetMap

Detecting missing ways in OSM from Strava heat maps

Posted by jcr83 on 5 December 2023 in English. Last updated on 7 January 2024.

Version française

Introduction

Many paths, tracks and even small streets are missing from OSM. And yet, these paths are often used by Strava users, who upload their traces there. So I thought it would be interesting to use these data to improve OSM.

That’s why I wrote a Python program which, by analyzing Strava data, is able to detect missing paths in OSM, then generate files to create MapRoulette challenges so that each missing path can be added to OSM.

What is Strava?

According to Wikipedia, Strava is a website and mobile application used to record sports activities via GPS. Its members use devices such as a GPS watch or smartphone to record their running activities, and send these to Strava. Currently (2023), there are over 100 million members.

Strava heat maps

On its website, Strava publishes a heat map showing the aggregation of all its users’ tracks.

Example: Strava heatmap example

The more a route is ridden, the brighter its track appears on the heatmap.

In fact, there are several Strava heatmaps, one for each activity (running, cycling, skiing…). We use the running map, which is the most accurate due to the low speed of runners, and which reflects all paths, even those impassable in other activities.

Precautions to take when using Strava

You shouldn’t always blindly trust the tracks on the Strava heat map. Indeed:

  • Tracks may be obsolete, following a major climatic event that destroyed paths (for example, in 2020, storm Alex destroyed many paths in the French Alps).
  • Tracks may correspond to trail runs that have taken off-trail routes or private property.
  • Ski runs and ski lifts may appear on the running map, if users have not indicated that they practice winter sports.

Principle of missing ways detection

The software analyzes the Strava heat map to detect bright trails near which there is no path in the OpenStreetMap database.

The detection threshold can be set according to three criteria:

  • minimum brightness level (from 0 to 255).
  • minimum distance from an OSM path.
  • minimum track size.

The Strava heat map is supplied in the form of 512 x 512 pixel tiles. Each tile is analyzed independently of the others. To avoid detecting the same path twice when it straddles two contiguous tiles, it is possible to analyze only one tile out of four. This means you need to perform four analyses and update the MapRoulette challenge each time.

MapRoulette challenges in progress

Metropolitan France

French Overseas

Austria

Switzerland

Installation

The source code is available in this Github repository. The installation procedure is described in the README.md file.

How to add a new challenge for another area

If you want to add a new MapRoulette challenge for another area, the principle is as follows:

  • on the OSM-Boundaries site, download the boundaries of the area you’re interested in, choosing the Land only option. You’ll get a file in GeoJSON format.
  • run the strava.py program, passing the boundary file as a parameter. For example, for the Principality of Andorra:

python strava.py -v -a Andorra.geojson -g Andorra_Missing_Ways.geojson

  • Create a MapRoulette challenge and select the I want to upload a GeoJSON file option to upload the Andorra_Missing_Ways.geojson file.

More information in the README.md file of the Github repository.

Discussion

Comment from Davide Lasagna on 6 December 2023 at 23:14

Very cool project! Are there any licensing concerns with using strava’s data for openstreetmap?

Comment from Andrea Musuruane on 7 December 2023 at 12:02

Very very useful tool! I tried it on my local area of knowledge and I discovered a lot of missing highways.

Comment from Luzandro on 21 December 2023 at 20:02

great tool, thanks a lot! I’ve tweaked it a bit to ignore leisure=pitch/sports_centre as well.

I don’t get what’s the benefit of the tasks database though? If you rebuild tasks in Maproulette it won’t create duplicates with the same id anyway?

Comment from jcr83 on 22 December 2023 at 20:33

Hi Luzandro, I know that there are many false positives near sport centers, but sometimes they are real issues, because the “leisure=track” is missing. About the tasks database: In early versions, only one missing way per strava tile was detected, so if there was a false positive, other missing ways could not be detected. But I improved the algorithm, and now several missing ways can be detected in a single tile. So I have have to check if the database is still useful.

Log in to leave a comment