Hi everyone, this is an update on my progress in enhancing Nominatim’s search results ranking. For an overview of the project, you can check out my previous diary entry here. I would like to thank my mentors, Sarah Hoffman (@lonvia) and Marc Tobias (@mtmail), for their guidance throughout the implementation of this project.
Goals of the First Phase
The first phase of this project has some goals which were previously set. Below are the main goals of this phase.
- Enabling PostGIS to work with raster files
- Finding and implementing the most suitable method used to import GeoTIFF files
- Conducting performance tests on the import functionality
- Adding unit tests
- Documenting the new changes
Hardware I Am Using
Since Nominatim with a full planet import needs a lot of computing resources, I had set up the server which allowed me to work on the project. I would like to thank OpenCage for providing me with the server to work with on this project. The specifications of the server that I am currently using are 8 core AMD Ryzen™ 7 3700X, 64GB RAM, 1TB NVMe disk (900GB usable, 850GB free), running Ubuntu 22.04 LTS.
OSM Views Data
As mentioned before, OpenStreetMap has log information about the number of successful requests by the users for each map tile. This information can be found here. The first thing I have done was download one of the log files and understand its content. This led me to read about the Web Mercator projection to have a lower-level understanding of how tiles work and better understand the logs. After that, I started using a GeoTIFF file that stores the same information that the logs have. This GeoTIFF file, which is currently 387MB in size, is the source of data that is chosen to be used for loading the map tiles’ access numbers into Nominatim’s database. GeoTIFF is a variation of the TIF format that adds a set of tags containing geospatial data in order to provide internal georeference information for the raster data in the file. The image below is the illustration of the map access numbers that are stored in the GeoTIFF file that is used in this project. This image is generated with QGIS.
PostgreSQL and PostGIS are already being used in Nominatim to store and query geographic objects. However, in order to load the GeoTIFF file, adding support for working with raster data to Nominatim is also needed. I have done that by adding a new function that creates a new database extension called “postgis_raster”. After that, I have used raster2pgsql which is the default tool of PostGIS for loading raster data to the database. I have integrated raster2pgsql so that it is called programmatically by Nominatim. The tool itself has various options that affect how the raster data is being loaded into the database. One of the options that have been set is GiST indexing on the raster column so that querying a specific raster data becomes much faster. Another option worth mentioning is the tile size which is the size of the raster that will be cut into and inserted one per table row. The optimum raster tile size when using raster2pgsql is in the range of 32x32 to 100x100. I have conducted performance tests twice on each of the two tile sizes of both ends of the recommended range to understand the time it takes to load the GeoTIFF file into the database, the space the raster data takes, and the number of rows of the newly created table. The table below is the performance test results:
It is clear that the 100x100 tile size is the better option, thus I have chosen it to be the tile size for importing the GeoTIFF file into Nominatim. The image below is how the raster data looks inside its table after importing the GeoTIFF file into Nominatim’s database.
The image below is another table that I have created that contains the access numbers which are extracted from the loaded raster data, as well as their corresponding places in the map which is found in the “placex” table.
Additionally, I have created the functionality of refreshing the map access numbers using the same function of importing the GeoTIFF file with the inclusion of dropping the raster table if the table already exists. That way, the new raster data replaces the old one.
Unit Tests and Documentation
Finally, I have created some unit tests and documented the new changes to cover the functionalities that have been added to Nominatim.
Now that the map access numbers can be loaded into Nominatim, the main next step is to enhance the search ranking algorithm by including the map access numbers into the computation of the places’ importance values. Feel free to ask any questions about my progress so far or the next steps of the project and I will happily answer them.