OpenStreetMap

Nominatim suggestions update

Posted by krahulreddy on 30 July 2020 in English (English)

As part of phase two of the GSoC project, following work has been done.

Server is Online

At the end of phase one, we had a server up and running with nominatim-ui and elasticsearch server. This is from the machine provided to me as part of this project. It currently has the entire planet DB set up. Suggesstions were setup for few smaller DBs, but not for the planet DB as of now. There is constant debugging and changes going on, so the suggestions might not be available at all times. I will post another dairy update once the suggestions are completely available.

The server is hosted at http://95.217.117.45/nominatim/ui. This will be available only during the course of this GSoC project (till 31 August 2020). The suggestions are provided from a hug API call. This can be accessed at http://95.217.117.45:8000/pref?q=. This internally queries elasticsearch on the server and returns the results.

Elasticsearch configurations

Elasticsearch provides a lot of configurations, which can be tweaked to obtain optimal performance. For our setup, the requirements include: * Require less space. * Fast indexing

The following are few of the options in elasticsearch that were explored during this phase.

These will be tested out with the planed DB indexing, which will be done soon!

Address formation

The address is being formed in all the languages entered in the Nominatim DB. You can look at all the languages by looking at the tags at http://95.217.117.45:8000/pref. For example: http://gsoc2020.nominatim.org:8000/pref/?q=bangalore%20north has the result: "addr": "Bangalore North, Bangalore Urban, Karnataka", "addr:kn": "ಬೆಂಗಳೂರು ಉತ್ತರ, ಬೆಂಗಳೂರು ನಗರ, ಕರ್ನಾಟಕ", "country_code": "in"

Indexing rate:

The first set of indexing tests had only a single language address formation. For that, the indexing was around 4500 docs per second. With the increase in the languages, this is expected to come down.

The major bottleneck in indexing is the address formation. The elasticsearch bulk indexing has rate of more than 25000 docs per second.

Language support

The final suggestions will be placed in such a way that the suggestions will be available in all supported languages. The current setup fetches all languages, but displays only the default field. So you can expect the rest of the languages to appear soon.

Next steps

The next steps include:

  • For elasticsearch
    1. Add typo tolerance for suggestions.
    2. Tokenization of words to allow different order of words.
    3. Try storage optimization tehniques.
  • For the project code
    1. Finish the code.
    2. Documentation.
    3. Add this API for OpenStreetMap.

Comment from Sanderd17 on 4 August 2020 at 06:06

I just tried this a bit, and it’s an improvement, but it seems to be very street-focused.

Finding a street in a city is no problem. But typing a country or city name to just roughly go to an area doesn’t suggest the city (but it does suggest all streets that have that city in their name).

Also, it would be nice if you could find a good sorting based on a combination of proximity and importance. When I’m viewing my region (province of West-Vlaanderen in Belgium), and I type “Ro”, I would expect “Roeselare” to be the first result as it’s a decent city right in the center of my view. Later down the list, it would perhaps show “Roesbrugge”, which is a small village still in my view. Then perhaps “Romania”, a country (thus important), but way out of my view. And when all countries, cities and villages are listed, only then show the streets.

For a rough static importance, you could perhaps count the number of obects (nodes/addresses/other features) in the city limits, or along the street (a certain distance away from the road). Which would account for area, population, and even correlate to OSM users.

The dynamic importance (based on the current view) will probably be harder in elasticsearch. You could implement something based on the number of tiles away. Like divide the importance by log(2+x) where x=0 means it’s in the current view, x=1 means it’s a neighbour of the current view, …

But perhaps even a bigger difficulty. When I look for a street, I may be looking for an address. So when the street is completed, it should probably start to suggest addresses in that street.

Login to leave a comment