Recent diary entries
Hi everyone, this is the update on the final phase of my progress in enhancing Nominatim’s search results ranking.
Previous Diary Entries
To have a background understanding of the project, you can check the overview of the project followed by the project’s first phase down below.
Project’s Pull Request
To see the code of the project, you can check the pull request here.
Detailed Report of the Project
The detailed version of the report can be read here.
What Has Been Done
- Enabled PostGIS to work with raster files
- Nominatim CLI tool can now import OSM views data from the GeoTIFF file
- Nominatim CLI tool has the ability to refresh the OSM views data and recompute the importance scores
- Integrated OSM views data into the algorithm that computes the places’ importance scores which are used in ranking Nominatim’s search results
- Added some unit tests
- Updated the documentation and added a detailed report of the experiments conducted
Possible Next Steps
There is a weak correlation between the OSM views data and the wiki importance data. A number of possible reasons have been outlined in the detailed version of this report in which further investigation is needed to have a better understanding of this outcome. Apart from that, the OSM views import feature can be enhanced so that the user can specify the zoom level when importing the data, or even have multiple zoom levels data imported one after the other so that the OSM views data have a higher degree of accuracy.
What I have learned
Working on this project certainly sharpened my SQL skills since the data that I was working with is huge which lead me to learn more about optimizing SQL queries so that they can be executed in less time. In addition to that, I have learned a lot about processing rasters and working with geographic objects inside databases which is something entirely new to me. Furthermore, I got exposed to several data normalization techniques and understood their pros and cons so that I can choose which one to use when processing raw data depending on the context of the given project.
I would like to thank my mentors, Sarah Hoffman (@lonvia) and Marc Tobias (@mtmail), for their guidance and support throughout the implementation of this project. I would also like to thank Paul Norman (@pnorman) for his comment and the discussion I had with him afterward that shaped the implementation of this project. I would also like to thank OpenCage for providing me with the server to work with on this project. I had a great learning experience and I am thankful to Google Summer of Code and to the OpenStreetMap Foundation for this opportunity.
Hi everyone, this is an update on my progress in enhancing Nominatim’s search results ranking. For an overview of the project, you can check out my previous diary entry here. I would like to thank my mentors, Sarah Hoffman (@lonvia) and Marc Tobias (@mtmail), for their guidance throughout the implementation of this project.
Goals of the First Phase
The first phase of this project has some goals which were previously set. Below are the main goals of this phase.
- Enabling PostGIS to work with raster files
- Finding and implementing the most suitable method used to import GeoTIFF files
- Conducting performance tests on the import functionality
- Adding unit tests
- Documenting the new changes
Hardware I Am Using
Since Nominatim with a full planet import needs a lot of computing resources, I had set up the server which allowed me to work on the project. I would like to thank OpenCage for providing me with the server to work with on this project. The specifications of the server that I am currently using are 8 core AMD Ryzen™ 7 3700X, 64GB RAM, 1TB NVMe disk (900GB usable, 850GB free), running Ubuntu 22.04 LTS.
OSM Views Data
As mentioned before, OpenStreetMap has log information about the number of successful requests by the users for each map tile. This information can be found here. The first thing I have done was download one of the log files and understand its content. This led me to read about the Web Mercator projection to have a lower-level understanding of how tiles work and better understand the logs. After that, I started using a GeoTIFF file that stores the same information that the logs have. This GeoTIFF file, which is currently 387MB in size, is the source of data that is chosen to be used for loading the map tiles’ access numbers into Nominatim’s database. GeoTIFF is a variation of the TIF format that adds a set of tags containing geospatial data in order to provide internal georeference information for the raster data in the file. The image below is the illustration of the map access numbers that are stored in the GeoTIFF file that is used in this project. This image is generated with QGIS.
PostgreSQL and PostGIS are already being used in Nominatim to store and query geographic objects. However, in order to load the GeoTIFF file, adding support for working with raster data to Nominatim is also needed. I have done that by adding a new function that creates a new database extension called “postgis_raster”. After that, I have used raster2pgsql which is the default tool of PostGIS for loading raster data to the database. I have integrated raster2pgsql so that it is called programmatically by Nominatim. The tool itself has various options that affect how the raster data is being loaded into the database. One of the options that have been set is GiST indexing on the raster column so that querying a specific raster data becomes much faster. Another option worth mentioning is the tile size which is the size of the raster that will be cut into and inserted one per table row. The optimum raster tile size when using raster2pgsql is in the range of 32x32 to 100x100. I have conducted performance tests twice on each of the two tile sizes of both ends of the recommended range to understand the time it takes to load the GeoTIFF file into the database, the space the raster data takes, and the number of rows of the newly created table. The table below is the performance test results:
It is clear that the 100x100 tile size is the better option, thus I have chosen it to be the tile size for importing the GeoTIFF file into Nominatim. The image below is how the raster data looks inside its table after importing the GeoTIFF file into Nominatim’s database.
The image below is another table that I have created that contains the access numbers which are extracted from the loaded raster data, as well as their corresponding places in the map which is found in the “placex” table.
Additionally, I have created the functionality of refreshing the map access numbers using the same function of importing the GeoTIFF file with the inclusion of dropping the raster table if the table already exists. That way, the new raster data replaces the old one.
Unit Tests and Documentation
Finally, I have created some unit tests and documented the new changes to cover the functionalities that have been added to Nominatim.
Now that the map access numbers can be loaded into Nominatim, the main next step is to enhance the search ranking algorithm by including the map access numbers into the computation of the places’ importance values. Feel free to ask any questions about my progress so far or the next steps of the project and I will happily answer them.
Hi everyone, my name is Tareq Al-Ahdal. I am a computer science undergraduate student at Universiti Teknologi Malaysia. Recently, I got accepted into Google Summer of Code 2022 as an open source contributor with OpenStreetMap. I will work this summer on enhancing Nominatim: OpenStreetMap’s geocoding software that enables us to search and find location addresses based on their names and vice versa.
Nominatim is currently using a computed importance value to rank the search results based on the location’s perceived importance. This importance value is derived from the popularity of the Wikipedia article of each location. However, not every location on earth has its own Wikipedia article. As a result, the locations that do not have their own Wikipedia articles will not have an importance value, thereby the ranking of the search results, in that case, is deemed inaccurate. OpenStreetMap has data regarding the number of times users accessed each location on the map. This data is a good indicator of how popular a place is. The aim of my work is to integrate this data into Nominam’s computation of the importance value so that the search results become more accurate which will help the users find the correct places that they are looking for in less time.
I will use this diary to keep you updated about my work. Please feel free to reach out if you have any questions regarding my work or anything else you have in mind.