I wrote about Cygnus, our effort to create an OSM-specific conflation engine, a few months ago. We developed it specifically to aid the import of INEGI road data in Mexico we are preparing together with the community in Mexico. But when used with care, I think it can be very useful in other import scenarios as well. This is why we decided to make it into a web tool anyone can use.
Before I go into details, I should offer a few words of caution.
Firstly, this tool is an early beta. Right now, it only conflates roads, nothing else. We have tested the tool internally, but only on a limited amount of cases, all using open INEGI data from Mexico. With this public beta, I hope to gather more feedback to help us make improvements to it.
Secondly, using this tool requires careful planning and preparation. It is definitely not for casual users. It is useful mostly in import scenarios, so the usual extreme caution and preparation related to data imports apply as well. Anyone considering any import needs to take this warning on the imports wiki page very seriously:
If any of what you read below feels confusing or difficult, you should probably not be using Cygnus or attempting a data import in the first place. I do not mean to be discouraging, but importing data into OSM is hard, and you can easily destroy other people's work. So if you become aware of an external dataset that you think would be interesting to incorporate into OSM, study the import guidelines, talk to your local community and discuss the best way forward. Never go at this alone.
What does Cygnus do?
Conflating in GIS is the act of merging two data layers to create one layer containing the features and attributes of both original layers. (A more official sounding definition can be found here.) Cygnus is a tool that conflates external data with OSM. You feed it an external dataset, and Cygnus will compare it against current OSM data. It will give you a result file in JOSM XML format with all the changes. You can load this change file into JOSM and merge it with an OSM data layer. The result could - with extreme caution! see above! - be uploaded to OSM.
Cygnus does its conflation in a non-destructive way. No existing OSM ways are ever deleted or degraded. Existing geometries do get changed where new connections need to be made. Where new and existing ways overlap, Cygnus will honor the original way geometry and attributes.
Let's look at an example so I can go into much more detail.
An INEGI example: Los Morales, Mexico
For this example, I picked the local road data for Los Morales, a hamlet north of the city of Monterrey, Mexico. Looking at Geofabrik's Map Compare, this hamlet is all but non-existent on OSM:
Mexico has a huge open data initiative, and I wrote about the data from the national Census bureau, INEGI, previously. The data can be found on the INEGI web site as part of the dataset 'Información Vectorial de Localidades Amanzanadas y Números Exteriores' for the administrative region of Salinas Victoria. (More background for the various INEGI road datasets is being compiled on the OpenStreetMap wiki by my colleague Andres.)
Before we can even start to think about conflation, we need to ensure a proper attibute translation. I purposely picked a fairly uncomplicated example so we can remain focused on the process as a whole. The attributes for this dataset are fairly straightforward:
I want to discuss the translation of INEGI data specifically in another blog post in the future. We are working with INEGI to get as much data as we can to ensure a proper mapping from INEGI road types to OSM way types. What I am doing here is a much simplified example for the sake of this demonstration. The result will not actually be uploaded to OSM.
Upload to Cygnus
Now that we have an input file, we can offer it to Cygnus for processing. When you load the Cygnus service page, you see this simple interface:
There are just two pages: the home page where you add new jobs, and the Job Queue page where you can see your progress and download the result. To add a job to the Cygnus queue, I upload the file I have prepared, and add it to the job queue:
Note that your upload needs to be small-ish: the spatial extent needs to be smaller than 50x50km and the file needs to be 20MB or smaller in size.
If your input file was uploaded successfully, Cygnus will go to work. Your job will be added to the back of the queue. When it's your turn, Cygnus will read your PBF file, and download the OSM data for the same extent. This is done using the Overpass API. It will then compare your upload with the existing OSM data, and produce the ouptut file. I will spend a separate post on more details about the Cygnus process. For now, the most important thing to remember is that Cygnus will only consider roads. This includes most
highway ways in OSM. Any other data will be ignored.
When Cygnus is done processing your file, it will be available to download in the job queue.
You can download and/or remove your file here. Everyone's jobs are visible here, so please be careful not to touch other users' stuff.
Inspect and process in JOSM
The downloaded file is plain JOSM XML:
What you see here are the differences between OSM and your uploaded file. This includes new ways added, ways with changed geometries and ways with new tags added. Next we need to inspect the changes carefully against existing OSM data. Cygnus is set to conflate very conservatively by default. The results surely will need manual tweaking.
This is by far the most important, and time consuming, step!
So I load the OSM data in JOSM for the same extent. First, I use the layer panel to get a quick overview of what has been added or changed:
Because there was already a
highway=secondary, it is probably a good idea to pay close attention to the data there. While Cygnus does a best effort to connect ways where needed, it acts conservatively so it will not snap ways together that do not belong together.
Here are a few ways that got properly connected to the existing
But here the distance was too far so Cygnus did not snap:
In this case, you would need to manually connect the ways if that is appropriate.
You can also inspect what Cygnus proposes by selecting any way of the Cygnus layer and looking for the
telenav:graphenhancer tag. This will have the value
new for added ways, and
changed:geometry for ways that have geometry changes, for example.
(The quality of the result will not only depend on what Cygnus does. 'Garbage In, Garbage Out' also applies. So before you even offer your file to Cygnus for conflating, make sure you have triple / quadruple-checked your atribute translation tables and other pre-processing steps.)
When you are finally satisfied with your manually post-processed conflation result, you can go ahead and merge it with the OSM data in JOSM:
When that is done, you probably want to remove the
After that, the data should be close to ready to be uploaded to OSM. In this case, I am not going to do this because I did not follow import procedures at all, and I wrote a quick, simplified translation file for the attributes. So Los Morales is still just as sadly absent from OSM as it was before. But I hope this will give you an idea of what Cygnus is capable of.
If you have existing import plans that involve road network, and you would like to take Cygnus for a spin, please do. I am happy to help. Email me, ping me on twitter or Skype (mvexel). I am looking forward to hearing from you!