Hello everyone! I would like to make an update on my project “Nominatim QA Analyser Tool” which is progressing very well.
As a recap, this project aims to have a tool capable of analysing the Nominatim’s database to extract suspicious data from it. Then, these data should be presented to mappers through a graphical interface so that they can correct them.
The tool is still under development, it lacks of tests, documentation, configuration etc. However, you can access the github repository there if you are interested: https://github.com/AntoJvlt/Nominatim-Data-Analyser
We chose to use Osmoscope as the main visualization tool for the data we extract with the Nominatim QA Analyser.
I have setup up an instance of Osmoscope on the development server which was provided to me for this GSoC project. This instance is publicly available there: https://gsoc2021-qa.nominatim.org/osmoscope You are free to look at it and start fixing some data errors around you!
/!\ Here are some important informations to know about this public instance /!\
In this section, I will talk about some technical aspects of the Nominatim QA Analyser Tool and I will focus on the most important points.
In order to have a flexible architecture and reusable components, I went for a pipe structure. Therefore, one rule is represented as a pipeline where each pipe is a processing task which sends its result to the next pipe.
The most used pipes that we currently have are the following:
With this set of pipes, as an example, we can have a rule with pipes plugged in this order:
SQLProcessor -> GeoJSONFeatureConverter -> GeoJSONFormatter -> LayerFormatter.
In order to reduce the amount of code needed to add a new QA Rule and to make it more easy, I introduced the YAML rule specification.
Each rule is defined inside a YAML file. This YAML file follows a tree structure where each node is a pipe and each node can have one or multiple childs defined in the “out” property of the node. Here is an example for the QA rule “boundary=administrative without admin_level”:
When executing a rule, the QA analyser will take the corresponding YAML specification file and it will parse it. The parsing is done by the deconstructor module which will go through the tree structure and send events when it reachs a new node and when it backtracks through the tree to an upper node.
The assembler module subscribes to the deconstructor and it is responsible of assembling the nodes, instantiating the right pipes, and plugging them in the right order. All of that is done smoothly because the deconstructor is sending nodes by following the tree structure so they are in the right order.
All of this YAML specification is made possible because of the pipe structure that I have set up before.
Some of the rules return a lot of results, so in order to display them properly through the osmoscope instance without killing the browser, I had to add a vector tiles output to the tool. This was done by implementing the VectorTileConverter pipe.
I decided to use Tippecanoe from Mapbox because it is very efficient and very easy to use in order to convert a geojson file into vector tiles. For now, the VectorTileConverter pipe is getting a geojson file as input and it calls Tippecanoe from the command line to convert the file to vector tiles automatically.
This is probably not the most efficient way to do this but it works well for now.
Here is a list of things that need to be done in the second part of this project:
Here is a list of things that might be done next depending on the direction we want to take for this project:
I would like to thank my mentors: Sarah Hoffmann (lonvia) and Marc Tobias (mtmail) who help me a lot for this project. A special thank to Sarah Hoffman who helps me a lot to make the Nominatim database query for the rules and she also helps me to understand the OSM data better.
Comment from tordans on 26 July 2021 at 08:12
Hi AntoJvlt, thanks for sharing!
How can I resolve an issue on https://gsoc2021-qa.nominatim.org/osmoscope/#map=14.618837772215969/13.40799/52.49792&l=https://gsoc2021-qa.nominatim.org/QA-data/same_wikidata/osmoscope-layer/layer.json as false positive?
Example: https://www.openstreetmap.org/node/473867813 (State Berlin) and https://www.openstreetmap.org/node/240109189 (City Berlin) both reference https://www.wikidata.org/entity/Q64?uselang=de which represents State + City.
Being able to mark cases like this as false positive will make it easier to work with the QA Tool.