This document explain the process that we had used to import more then half of the municipalities in Mexico.
- admin_level=4 means Province ( State )
- admin_level=6 means Municipality
Short stats :
In Numbers : Number of nodes/ways/relations deleted
- 500K / 2k / 500
Number of nodes/ways/relations added
- 1000K / ~4k / ~1050
Number of hours spend :
Number of municipalities added :
The post is broken down into the steps that we had done to accomplish the import. From the start, i want to point out that this is a learning process, and we are learning all the time. If you find something thay you consider suboptimal, or you have a suggestion of how to do something in a different way, please do tell.
Step 0 : Data Transformation
The process is illustrated in the wiki page Mexico's Administrative Divisions Import Project
Step 1 : Think at the workflow of the process
We had help from the people that did the Puerto Rico import, and I want to thank them for all the support and help that they had given to us for this import.
The difference was that we had the data a little more complicated, because we did not start with a clean map, we had a map that also had admin_level=4 , admin_level=6, admin_level=7|8|9|10 boundaries already in place.
One of the technical difficulties that we had in the beginning was this :
- Find solution on how to deal with the ways that also have another relation in the way ( admin level 4 and 6, let`s say ). That means that we cannot delete them, because level 4 relation depends on the same way and when we will do the import it will create duplicated ways along the line If this is not done by script, we need to manually add around 500 boundaries to the new ways and recreate some relations. ( see the task at Step 4 that needed 70 hours )
I would like to thank Rafael Avila and all the people from the OSM_MX for the support that we had have for doing this.
We had opted for the method where we had deleted everything that is admin_level=6 and manually relink the admin_level=6 relations to the admin_level=4 boundaries, or in the places where the admin_level=4 was different then the admin_level=6, move the admin_level=4 boundaries to look like the admin_level=6 . This is because the data that we are uploading is autorative, being the official INEGI data.
One of the solution was to use a script that victor had shared with us https://github.com/vramirez122000/osm-pr-boundaries
We spend around 3 days doing the splitting and merging of the boundaries in QGIS. I will not explain the process, because, in the end we had developed a internal tool called Mexico Split that is converting the INEGI Boundaries Shapefiles into OSM file
Technical details : OSM does not allow to have more than 2000 nodes for a single way, we needed to recreate the relations, split at each intersection the nodes.
Mexico Split is a small desktop tool used for converting Mexico municipality borders given as polygons into unique poly-lines. The borders of two adjacent municipalities, given as polygons, will inherently have some overlapping ways.
This tool is designed to eliminate by detaching them from their polygons and replacing them with a single way common to the two involved polygons.
Besides this main purpose, the tool also splits any resulting ways longer than 2000 segments in shorter ways, groups the ways in relationships according to the borders they define, and adds some predefined tags to these ways and relations.
Tags added to relations:
Tags added to both ways and relations:
Compare to original data
JOSM boundaries Map Paint Style
And also a Map Paint Style for JOSM designed specifically for easing the process of importing the admin boundaries.
Technical Details : The style highlights the last node of every way, making it simple to see the length of every way.
The square node also have a certain degree of transparency, so we can see if there is a node under the node.
To be able to work in a systematic way,
You can try the style here https://github.com/baditaflorin/boundaries-import-JOSM-MAPCSS-STYLE
Quickly see duplicated nodes
See the difference between the admin_level=4 and admin_level=6
Track status of changes
Step 2 : Delete old data
I have spend almost 8 hours on a dead end, it was impossible to bulk delete the old boundaries, because of admin_level=5 admin_level=8 admin_level=9 and other small boundaries that are invisible attached to the admin_level=6 , a modified overpass turbo query helped us success doing the deleting of the file.
We had to unglue over 2000 nodes that had a common node with the boundaries. This where ways where the boundaries where the same with a river, a highway, or level_crossing , etc
More info about boundaries that share nodes or ways can be found in this open question addressed on the talk mail-ling list https://email@example.com/msg54274.html
We first started doing this manually, but when we find out that there are over 2000, we asked for help and we had used a script that does the unglue . Overpass-API is limited in the amount of things it can do, at least for the whole mexico, so what we ended up doing was to create a wiki page with all of the provinces and unglue province by province . This could not have been done without the of the help of Rafael Avila Coya Hackpad link describing the process
Technical Details : we used a modified version of this script http://overpass-turbo.eu/s/bXY
Also, We deleted around 500 old relations, having a total of over 500.000 nodes.
For every 50.000 nodes that I upload or change, I have to wait almost 30 minutes for the upload to pass. This are the little things that you do not expect and you learn them on the go. I did not included in the initial timeline the amount of time needed for downloading, deleting and uploading such a big amount.
Technical details : The process of uploading the deleted nodes, ways and relations alone took around 5 hours
Step 3 : Add new data
We uploaded all the municipals for 21 provinces, in total there were over 1050 municipals, comprised of over 1.050.000 nodes.
The process of uploading 1 million nodes takes around 7-8 hours. I had to manually do some reverts because when it was still uploading the nodes, a user deleted a bunch of nodes, and the upload failed at the very end, because the way could not upload, because it did not find the nodes that it was looking for, because in the time-frame of the 6-8 hours, a user observed some nodes without any tags on the map and he had deleted them. One solution that would burden the osm server and the history file a little more would be to add a tag to each of the 1 million nodes, and delete the tag after the upload is complete.
Step 4 : Clean / verify the new added data
We had to manually repair over 500 municipals relations that need to be connected to the province.
We did not manage to have a script that will also download all of the nodes, ways and relation that are attached to a relation. In this sense, sometimes you will try to delete an old admin_level=4 boundaries and you are not allowed because some other things are attached to it. The process of doing this should be better documented.
Technical details : This was the longest part, the workflow is illustrated in the youtube videos that we will upload – around 1 hour of footage.
There were different islands where the INEGI dataset was off with values as large as 1.5 kilometers.
And some more errors, this one found by my colleague Gabriela
Future toughs :
There is not yet a simple way of merging 2 different datasets, for example the SHP for admin_level=4 and the SHP for admin_level=6 in the idea of a automatic process. Talking this one more step, to import the admin_level=4,6 and 8 in the same run, to create a script that will merge all of the datasets and when you will upload, to have a complete upload.
It will be logarithmic more harder to do this in a manual way for the AGEB data, where we are not talking of 2400 municipals, but of 16000 AGED blocks. For the municipals alone, for the whole mexico, you will have to reglue 970 relations.
For the AGEB dataset, you will have to re-glue around 7000 relations.