OpenStreetMap

Juno opens its GPS traces to aid in mapping New York City

Posted by Zverik on 19 March 2019 in English (English)

It all started with a bad edit. We in Juno rely on routing over OpenStreetMap roads, and notice every change that breaks the map matching. One day, we encountered a pretty big breakage caused by reversing some one-way roads. Turns out, all sources available to mappers, and even proprietary alternatives, like Google Street View, were obsolete enough to validate the edit. I told the story in my FOSDEM talk, but the main point is that it gave me an idea.

How fresh a source can be? We’ve got plenty of these, but they all are very old. GPS traces come very slowly, and people still trace lines from ten years ago. Satellite imagery is updated once every few years. Street View photos (that is, Mapillary) is always older than you’d like. If something, like a street direction, has been changed in past two months, you’ve got no way to know it and reflect it on the map, save for surveying it yourself. It’s okay for a small village, but not for New York.

Overview of Juno GPS traces over New York City

We need NYC data to be as fresh as possible, so few weeks ago we have published all our GPS tracks for you to trace over. Not as raw data, of course, but as a raster tile layer, looking exactly the same as our standard GPS layer. But with more traces. Tens of thousands tracks — just for one day, never older than two days from now. It is updated daily, so remember to clear your cache when confused.

All the data comes from our drivers, so if you see a line on the layer, you can be sure somebody drove there yesterday or a day before. A line over highway=construction? Time to mark the road open. A line in a wrong direction over a one-way street? That’s an error in the map. A segment of highway=secondary with no lines over it? Check the news: it might have been closed recently. If you zoom close enough (tiles are rendered up to z19), you can spot turning lines, allowing for validating turn restrictions.

I believe this is the freshest geodata source available to OSM mappers. No other source consistently gives you data for updating a map as it was few days ago on this scale. The layer is local, but it can demonstrate a possible future for OSM sources.

Obviously, we take privacy seriously. All traces are trimmed from both ends by a random amount, and everything is cut by a bounding box, so you won’t find a lone track going to a specific neighbourhood of Trenton and back again. Time range is offset slightly, so there is no telling on which day specifically a ride occured. Finally, there is no metadata that could link any pixel from these tiles to our internal data or link to real people in some other way.

The technical part was completed in just a day thanks to the open source ecosystem of OpenStreetMap. A special thank you goes to Eric Fischer, who wrote GPX importing and visualizing scripts for OSM. If you are interested in how it works, it is quite simple: a script downloads all our GPX traces for a day, uses Eric’s scripts to make raster tiles, packages them all in a Docker container and uploads it to Amazon EC2.

Close-up of traces around Grand Army Plaza in Brooklyn

The layer is already included in imagery suggestions in JOSM, when you edit inside New York City, and will be available in iD 2.15 when it’s released. For now you can use https://gps-tile.junolab.net/{zoom}/{x}/{y}.png for a custom layer (disable “Locator Overlay” to see the lines). The data is published under CC-BY-SA 4.0 with an explicit permission for tracing in OpenStreetMap. We are expecting to expand into New Jersey soon, but there are no fixed dates.

With this data, we aim not only to help mappers in the US to better map the NYC, but also to inspire employees of other companies to share their ride data. Publishing GPS tiles is safe, does not expose any secrets and helps people. If you work in Lyft, Uber, Yandex, Grab or other ride-sharing or car/scooter sharing service, please consider talking to your employer and publishing some open data. Contact me if you have questions or need help convincing.

Comment from Helmchen42 on 19 March 2019 at 12:58

Hi, I agree this is a great source for New York City and it would be a blessing if we had it for other cities in America and Europe (or elsewhere) as well.

Two things that I think could be improved: 1. Only using traces of the past two days is great to highlight recent changes, ad works well for streets that are often used by the drivers. There are however regions that are pretty sparse with streets that have not been used at all and others that were only used once. There changes might still go unnoticed for weeks or even months, or even worse a singular trace might be misinterpreted by any user to represent a fresh one way street. So for regions like around Juniper Valley Park or west of the Clearview Park golf course, a longer timeframe could be useful. Ideally both on the same tiles, one to two days in high density areas, one week or even a bit longer for suburbs with low usage.

  1. While the traces appear to be clean most of the time, there are some streets that appear (understandably)to be pretty rough - the south end of the Broadway for example. Since these are tiles - thus not raw data anyways - would it be feasible to smoothen them to a certain degree?

2.1. I also wanted to suggest that if there are enough traces in one place, that segments of traces that stray too far from the street as defined by the majority of the traces could be faded out, sharpening the depiction of a street/lane. As an example I would have named the FDR Drive but that very example is a good counter argument as different streets nearby, namely the South Street could easily cause a lot of problems for that.

Comment from Zverik on 19 March 2019 at 13:16

  1. I agree, right now I’m experimenting with aggregating tracks for a whole month to fill out-of-market areas.
  2. For quality, I filter out segments with HDOP over 40, which reduces noise to a certain degree. I understand that the Financial District and few other areas in Manhattan look rough, but they are quite rare, and I don’t want to sacrifice some raw data for most of the NYC to fix these.
  3. Again, I think that filtering by HDOP fixes most of the noise, and installing an OSM database to clear up the image might make a bad job: for example, it might fade out tracks near a recently opened road under construction.

Comment from Glassman on 19 March 2019 at 15:02

Ilya thanks for contributing the data. Hopefully other operators will consider doing the same.

Comment from StefanB on 20 March 2019 at 05:40

You can add layers to editors with a simple link, namely:

Comment from Zverik on 21 March 2019 at 07:28

Yesterday I’ve improved the tiles builder: now it downloads a whole month worth of GPX tracks for areas outside the market boundary. You can see the coverage there has increased substantially. The only downside is that you can’t tell what’s changed few days ago there.

Comment from pizzaiolo on 2 April 2019 at 00:22

Very cool! Any plans to do this for other cities as well?

Comment from Zverik on 2 April 2019 at 07:21

Well, that depends on Juno expanding its market. We’re hoping to cover the whole world one day, but for now it’s just New York. Maybe another company would follow the lead.

Comment from ConsEbt on 3 April 2019 at 13:03

Hi, this is a great project and thanks for sharing it. Do the cars use GPS only or do they have a OBD II dongle as well? I heard from people who where using OSC with a car dongle had greatly improved tracking.

Comment from Zverik on 3 April 2019 at 13:09

Thanks ConsEbt. Our cars rely solely on a GPS signal. Providing each of our thousands of drivers with an OBDII dongle and teaching them to use it would be quite a hassle for almost no visible gain.

Login to leave a comment