OpenStreetMap

Proof of Concepts

Posted by krahulreddy on 30 June 2020 in English.

The code for these POCs can be found here. This code will still be modified as we test and tweak various options available. This phase is important to establish that our project is going to work as expected, and there are no missing/misbehaving components.

As a part of this, the following components are designed and tested:

Getting input:

  • Psycopg2 Python library is used to connect to postgresql and fetch data.
  • DictCursors used to fetch data in a dictionary format. (This is necessary to ensure that the hstore data structure used to store name, address fields are fetched correctly.)

    Important note: These cursors are not thread-safe. So, going ahead, if multithreading is used, this must be kept in mind.

Formatting:

  • Created a Doc class with necessary fields. (Fields discussed in the last article)
  • Forming addresses using the place_addressline table. These will be finally indexed in elasticsearch along with other necessary fields.

Indexing:

  • Setting up an elasticsearch server and running it.
  • Creating an index, Deleting an index.
  • Inserting documents into the index. Indexing with looping and using bulk indexing.
  • Try indexing with varying numbers of records.

Hug API:

  • Create a hug API client to add an extra layer of security by avoiding exposure of elasticsearch endpoints.

Searching:

  • Setting up the front end on Nominatim. Available here
  • Fetching results from the hug API endpoint to Nominatim.
  • Displaying the results as an option list.
  • Selection of results by the user.

Outcomes/Observations:

  1. All the parts work well as expected.
  2. The indexing speed differed from system to system. On the server, we can index at a rate of >1500 documents per second. This is something we can work with at the moment, but further changes are to be made, and the goal is to reach 2000 documents per second.
  3. Bulk indexing works at exceptional rates for smaller extracts. But the rate goes down as more and more data is given. This needs a bit more work.
  4. With our indexing and frontend trials, we got good results using just the match_phrase_prefix option while querying elasticsearch.

Next steps:

With the POCs now completed, we move forward to the actual planning and design part of the project. Most of the parts are available. Planning will help structure the project and connect all the dots.

Discussion

Comment from spiregrain on 1 July 2020 at 11:18

What does it do?

Log in to leave a comment