Data issues in Japan

Posted by PlaneMad on 8 October 2015 in English (English). Last updated on 14 October 2015.

A Japanese translation of this post is available on

Over the last few weeks, the data team at Mapbox have been investigating the unusually large number of unconnected highways in Japan which otherwise looked comprehensively mapped.

screenshot 2015-10-08 17 25 25 Broken highways in Japan. Bigger circles indicate highways of higher classification

Looking into the data threw up quite a few interesting findings:

  • Most of the data in Japan is the result of the Yahoo Japan import from 2011 and contains over 5 million road segments.
  • The road data has positional errors (5-30m) when compared to GPS data and includes incorrect topology that does not match satellite imagery. screenshot 2015-09-14 16 08 10
  • Roads in metropolitan areas have been realigned to the correct position, but large parts of the country are still untouched since 2011. screenshot 2015-09-14 16 27 36
  • Many motorable roads are tagged as paths, and many paths tagged as roads screenshot 2015-10-08 18 34 54
  • Roads are split into small segments between every junction untitled
  • The classification of minor roads seems to be based on a YH:width tag that has inconsistent road width values compared to imagery. The result is arbitrary segments of tertiary, unclassified and residential roads throughout the map. untitled
  • The Bing imagery coverage for Japan is comprehensive but does not match the OSM data or Strava GPS data. There is both an offset and orthorectification errors that varies throughout Japan. New mappers end up realigning the data to incorrect Bing imagery using iD causing more inconsistencies.
  • Highly detailed maps from Japan GSI is available for tracing into OSM. On a closer look, the major roads are accurate, but the minor roads are not reliable.
  • There is high resolution orthorectified imagery for Japan from GSI which perfectly matches Strava GPS data and is the best imagery source available.
  • The coverage of orthorectified imagery from GSI is limited to only the major urban areas. screenshot 2015-10-08 18 00 41

Fixing the map

  • The complexities of the data issues in Japan make fixing the data a challenging task. In the 4 years since the import, large parts of the data remains untouched.
  • Members of the osm-ja community have expressed how these large scale data inconsistencies make it hard for grassroots mapping to happen
  • Our current OSM tools are not ready for a data cleanup of this scale and it requires evolving smart tools and a data cleanup strategy that can empower the local mapping community to fix the map.

The data team was eager to take up this challenge and got inputs from the Japanese community on how to approach the issue. You can follow the remapping trials and our findings in our /mapping repository. In a later post, I’d like to document the cleanup strategy using existing tools and learnings that could help make this a more comprehensive effort.

Comment from SK53 on 8 October 2015 at 19:42

It would have been great if this post had been written in the language of the local mappers. Perhaps you could reach out to someone in the OSM Japanese community who could assist you with translating this post. I think it is vitally important that future communication be carried out at least bilingually, but best with the lead documentation in Japanese.

As an aside much of the Yahoo import contains a large number of redundant tags. I’m not absolutely sure, but I think the main import was carried out shortly after the Great Tōhoku earthquake. Needless to say the attention of most mappers was dedicated to mapping areas directly affected by the earthquake.

This issue is typical of a) imports; and b) areas which are intensively mapped after a disaster. A huge volume of data is which is beyond the capacity of the local community to maintain. Even resolving the current issues does not address the key one. Always the best strategy for improving OSM is to grow the local community to the point where it does have enough capacity to improve the mapping in rural areas.

Comment from RicoElectrico on 8 October 2015 at 21:17

From what I’ve been reading, Japanese OSM is… specific. Whoever makes these “State of OSM in $country” blog posts should definitely make solid research into Japan.

There is only a single post on the “users: Japan” subforum! It’s about channels of contact, pointing at talk-ja among others. But still if it’s their primary means of discussion (couldn’t find other forums), talk-ja has a puny volume for a 100M country.

I dunno, Japanese and East Asians in general (even ignoring the China with their Great Firewall) seem to isolate themselves on the Internet and roll their own portals for everything. How does it fit in the context of global OSM?

Comment from PlaneMad on 9 October 2015 at 05:32

@SK53 Good call, this will be some valuable documentation to have for reference.

A point to note here is that growing a community in a place with broken data is quite a challenge. The AND imports in Mumbai was such a gigantic mess that the city of 18 million had 0 mappers till I manually cleaned it up singlehandedly over months. Its very discouraging for a new mapper to start off cleaning issues rather than building of the good work of others.

@RicoElectrico Unlike the western world of open source where mailman lists and irc are the norm, in Asian countries, people prefer social networks like fb and whatsapp to connect and communicate. Its a matter of convenience for each community to decide the language and mediums to talk to each other.

Comment from Nakaner on 10 October 2015 at 09:22

@PlaneMad: your descriptions about the state of Japanese road data at OSM really sound like a second TIGER import – misalignements, wrong topology, wrong tagging and no active community.

I have looked through the archives of Imports mailing list in 2011 and I have NOT FOUND any discussion about this import. The Import Guideline said following about the pre-import discussions in 2011 (note that the guideline has been reformated in April 2012):

Discuss your import on the mailing list and/or with appropriate local communities. Many local communities have their own wiki pages and/or a Mailing lists.

It does not clearly say that you must contact imports mailing list. Nowadays (since April 2012) it looks like this:

Discuss your plan. Email the OSM community to notify them of your plans, including a link to your wiki page. You can do this with an email to (at a minimum), talk-(your country), and the OSM group specific to the the area directly impacted by the import.

The import also has some other issues:

  1. There is no English documentation. The whole documentation is in Japanese. The English wiki page just contains four sentences. This has to be fixed.
  2. According to the imports catalogue, Yahoo Japan offered “ex-used map data”, i.e. old data they could not use commercially any more. I think that this the reason why this import should not have taken place. OSM is not a dumping ground for outdated and bad commercial geospatial data! Companies should use /dev/null if they do not need their data anymore.

From my point of view, the quickest and most sustainable solution would be a deletion all the roads which have not been modified after the import (i.e. partial revert of the import). If the import discussion had been taken place at Imports list and its data had been reviewed, this import might have taken place in a different way and might have been much slower. The history tells us that countries with large and quick imports do not have the powerful community to maintain their data. Just have a look who much edits take place in urban US areas and urban areas in UK or German speaking countries.

It looks as the Japanese community imported the data because they the Great Earthquake took place on March 11, 2011. Nowadays, humanitarian needs (HOT) can not be used as a single argument to import bad data into OSM. You cannot get an approval at Imports mailing list for a bad import even if a humanitarian crisis took place at the area of the import and HOT became active there.

Data users like Mapbox can still use the deleted data in their maps and products because it is free (ODbL-licensed) but the community should not suffer under the bad data as the U.S. community does/did.

Comment from karida on 12 October 2015 at 12:07

At the time when in many countries Yahoo imagery was available for tracing, this was not available for Japan unfortunately. So mapping was done mainly based on GPS tracks, which turns out to be very difficult in large Japanese metropolitan areas because of bad GPS signals.

With the import of Yahoo data the initial guess was, that this is a reliable data source, but then with Bing imagery available for tracing, there were inconsistencies. So what source should you trust? Your GPS data with all the noise from high rise buildings, a former commercial data set or the aerial photos from a big company?

I think this situation was (and probably still is) very difficult to manage for Japanese mappers. GSI imagery is only partially available and not default in iD editor. Offsets of Bing imagery varies too much. Most people may see Bing imagery with or without their own GPS traces. If they don’t match, they will probably “correct” it in a way that it aligns with Bing imagery, even if that’s not correct.

Comment from lxbarth on 12 October 2015 at 14:26

@SK53 - y, this needs to be translated, working on it.

Note that we’ve been in touch with the Japanese community from the get go on this project, with Taichi Furuhashi being extremely helpful - there’s a Facebook coordination page and continued communication in Japanese:

Comment from MAPconcierge on 14 October 2015 at 05:04

I’ve translated your post in Japanese. :-)

Comment from malenki on 16 October 2015 at 15:18

In 2013 I asked about a cleanup for the similar tidy waterway import on the Japanese ML: The response was underwhelming.

Login to leave a comment