OpenStreetMap

lonvia's diary

Recent diary entries

Nominatim and Postcodes

Posted by lonvia on 16 January 2018 in English (English)

Nominatim (the search engine that powers the search box on the OpenStreetMap website) has recently changed significantly its way how postcodes are handled. This post tries to give a bit of background on what has changed and why.

When you search for a place on osm.org, Nominatim not only presents the name of the place in the result but a complete address. This address not only helps distinguish the different places but is also used to narrow down your search. This address is not a postal address as you would put on a postcard. It is more a textual description where the place is located, in which suburb, city, state, country etc. This information is fairly easy to compute from OSM data. There are areas for all these administrative areas. So Nominatim just needs to check in which areas a place is inside, order all appropriately and there is the address.

Postcodes, however, are different. In most countries there is no such thing as postcode areas. Postcodes are simply assigned to a some place (a house or POI) in a fashion that is deemed most practical for the local postal service. Often the post codes follow delivery routes. It might be possible to draw an area around houses with the same postcode but this would be an artificial distinction and there is no guarantee that the resulting areas don't overlap.

For that reason, there are very few boundaries in OSM that describe postcode areas. Mostly postcodes can be found on house numbers and POIs in the addr:postcode tag. But even here coverage is rather sparse. So when computing the address of a place, Nominatim has to go a different way to determine the most likely postcode for a place where no addr:postcode tag exists.

With the new version, Nominatim tries two different methods to infer the postcode of the place: an address lookup and an area-based lookup.

The address lookup comes first. Nominatim assembles all other parts of the address and then checks if any part of the address carries an addr:postcode tag that might apply. It does that going from the most specific part of the address, the street, up to the most generic one, the country. As soon as it finds an appropriate tag, it stops and uses the postcode. This means that when tagging postcodes you can start with assigning an approximate postcode for a larger area, like a complete village or suburb, and then later come back and add addr:postcode tags to the handful of houses that are the exception to rule (or even complete postcode coverage for the whole village and then delete the postcode tag on the village again).

If there is no postcode to be found in the address, Nominatim tries the area method. That means that it ideally should be looking for the closest object with an addr:postcode tag within a certain area and use that postcode as a guess. This is unfortunately a bit expensive, so Nominatim implements a simplified version. For each postcode, it looks for all the points in OSM that are tagged with the appropriate addr:postcode tag and computes one central point, the postcode centroid. When guessing the postcode of an object with the area method, the closest postcode centroid is used. This is not quite as accurate but considerably faster. The postcode centroids are also used when you search for a postcode. If OSM has no postcode area, then an artificial point is returned with the same location as the centroid.

Postcode centroids have been a feature of Nominatim for a long time. However, they have always been static and only computed once when the database was initially imported. Starting with the next release, postcodes become their own entity in Nominatim and can be regularly recomputed and updated. On nominatim.osm.org this is already done once per day now.

Finally, there is also a change in the way postcodes are handled in your search query. Formerly, if you added a postcode to your search, you had to use the one that Nominatim had guessed for the place or you would get no result at all. That was particular annoying when Nominatim had guessed wrong and the search had the right postcode. With the new version Nominatim is now able to detect postcodes in the query and ignore them, if necessary. So if a place has a wrong postcode in Nominatim it is now nonetheless able to find the place by the correct address. There is one catch though: Nominatim needs to understand that the part of your query is indeed a postcode. At the moment it takes this information from OSM itself. That means it can really only detect (and ignore) postcodes that have been previously mapped in OSM somewhere. At some point, it will learn to detect postcodes by their format but that is a project for a future version of Nominatim.

Sorting route relations

Posted by lonvia on 10 September 2017 in English (English)

waymarkedtrails.org, the site to show all things route related, has always taken care to try to put route relation members in a sensible order before displaying them. A couple of weeks ago this has changed. The site now assumes that the members of each relation are already in the correct order. The reason for that is simple: sorting route relations is hard.

Before explaining why sorting is hard, let me explain why order even matters. As long as you simply want to display the route on a map, the order is not important. Simply color each way in the relation to your liking and the route is nicely visible for a human reader. However, waymarkedtrails.org does a bit more than that. It displays an elevation profile and allows to download a GPX of the route, so you can put it onto your favourite mobile device. When the route is not sorted in the expected order, then the GPX is unusable for many applications, for example in Garmin Basecamp.

So why not sort the route automatically? On first sight, this seems to be a simple task. After all, a route should be a simple linear connection from A to B. Unfortunately, the real world is much more messy.

Even the most simple case of a linear route from A to B already has two solutions to the problem. The route may go from A to B or from B to A. In most cases the direction doesn't really matter but there are exceptions. Take a downhill mountain bike route, for example, or a nature trail with information panels that should be visited in the correct order.

Then there are loop routes which end at the point where they started. When sorting automatically there is no way to determine the starting point of the route. In both cases, we could add extra tagging to mark the start and end points. But why add the extra work when sorted route give the start and endpoints clearly at the beginning and end of the list?

Also, not all routes in OSM are strictly linear: there are routes with directional deviations. This can often be found in cycling routes which might go through oneway streets. Sometimes these deviations mapped with forward/backward roles but not always. Then there are routes that contain alternatives, side trips to lookout points and alternative access points.

waymarkedtrails could surely use some heuristics to get all these cases right most of the time but then there is no way to fix the decision if it gets it wrong. I think that is much better to leave the final decision about the order to the mapper.

Keeping your routes in the right order isn't too much work either. The JOSM relation editor is a very powerful tool when it comes to sorting relations. In the list of relations there is a column that immediately shows you the connectivity of your relation members. It can even handle directional deviations and roundabouts. Keeping an eye on the connectivity column is always a good idea while clicking your route together. It gives you immediate feedback when you've missed part of the route or created a small dangling end when you forgot to split a road. If you already have mapped your routes without sorting them, there is even a button to sort the members for you.

So, overall sorted routes are an advantage for everybody. They don't solve all problems with routes but they are a huge step forward in improving the quality.

Finally some numbers about the current state.

Hiking routes:

  • 74337 linear routes already sorted (65 %)
  • 14608 linear routes, not sorted (13 %)
  • 24992 non-linear routes (22 %)

Cycling routes:

  • 28020 linear routes already sorted (55 %)
  • 5164 linear routes, not sorted (10 %)
  • 17639 non-linear routes (35 %)

Waymarked Trails goes OpenTopoMap

Posted by lonvia on 7 June 2017 in English (English)

Waymarked Trails is the map for all things route - hiking, cycling, skating and horse riding. It has been a long standing wish of many users to have hillshading and contours on the map. So last week the site has received a new feature that allows you to choose a different base map. The first addition to the available base layers is OpenTopoMap, a beautiful map crafted in spirit of German topological maps. To try it out, go to the settings menu (the little cockwheel on the bottom of the page) and choose the new base map from the drop down menu.

Many thanks to the friendly folks from OpenTopoMap for all their work creating the map and for allowing Waymarked Trails to offer it as a base layer.