OpenStreetMap

Conflation engine Cygnus now in public beta

Posted by mvexel on 24 November 2015 in English (English)

I wrote about Cygnus, our effort to create an OSM-specific conflation engine, a few months ago. We developed it specifically to aid the import of INEGI road data in Mexico we are preparing together with the community in Mexico. But when used with care, I think it can be very useful in other import scenarios as well. This is why we decided to make it into a web tool anyone can use.

Before I go into details, I should offer a few words of caution.

caution

Firstly, this tool is an early beta. Right now, it only conflates roads, nothing else. We have tested the tool internally, but only on a limited amount of cases, all using open INEGI data from Mexico. With this public beta, I hope to gather more feedback to help us make improvements to it.

Secondly, using this tool requires careful planning and preparation. It is definitely not for casual users. It is useful mostly in import scenarios, so the usual extreme caution and preparation related to data imports apply as well. Anyone considering any import needs to take this warning on the imports wiki page very seriously:

import-warning

If any of what you read below feels confusing or difficult, you should probably not be using Cygnus or attempting a data import in the first place. I do not mean to be discouraging, but importing data into OSM is hard, and you can easily destroy other people's work. So if you become aware of an external dataset that you think would be interesting to incorporate into OSM, study the import guidelines, talk to your local community and discuss the best way forward. Never go at this alone.

What does Cygnus do?

Conflating in GIS is the act of merging two data layers to create one layer containing the features and attributes of both original layers. (A more official sounding definition can be found here.) Cygnus is a tool that conflates external data with OSM. You feed it an external dataset, and Cygnus will compare it against current OSM data. It will give you a result file in JOSM XML format with all the changes. You can load this change file into JOSM and merge it with an OSM data layer. The result could - with extreme caution! see above! - be uploaded to OSM.

Cygnus does its conflation in a non-destructive way. No existing OSM ways are ever deleted or degraded. Existing geometries do get changed where new connections need to be made. Where new and existing ways overlap, Cygnus will honor the original way geometry and attributes.

Let's look at an example so I can go into much more detail.

An INEGI example: Los Morales, Mexico

For this example, I picked the local road data for Los Morales, a hamlet north of the city of Monterrey, Mexico. Looking at Geofabrik's Map Compare, this hamlet is all but non-existent on OSM:

losmorales-geofabrik

Data Source

Mexico has a huge open data initiative, and I wrote about the data from the national Census bureau, INEGI, previously. The data can be found on the INEGI web site as part of the dataset 'Información Vectorial de Localidades Amanzanadas y Números Exteriores' for the administrative region of Salinas Victoria. (More background for the various INEGI road datasets is being compiled on the OpenStreetMap wiki by my colleague Andres.)

Translation

Before we can even start to think about conflation, we need to ensure a proper attibute translation. I purposely picked a fairly uncomplicated example so we can remain focused on the process as a whole. The attributes for this dataset are fairly straightforward:

attributes

I created an OSM file from the data using ogr2osm with a custom, simplified translation file. Because Cygnus requires PBF input, I finally converted the OSM file to PBF using osmosis.

I want to discuss the translation of INEGI data specifically in another blog post in the future. We are working with INEGI to get as much data as we can to ensure a proper mapping from INEGI road types to OSM way types. What I am doing here is a much simplified example for the sake of this demonstration. The result will not actually be uploaded to OSM.

Upload to Cygnus

Now that we have an input file, we can offer it to Cygnus for processing. When you load the Cygnus service page, you see this simple interface:

cygnus-home

There are just two pages: the home page where you add new jobs, and the Job Queue page where you can see your progress and download the result. To add a job to the Cygnus queue, I upload the file I have prepared, and add it to the job queue:

upload

Note that your upload needs to be small-ish: the spatial extent needs to be smaller than 50x50km and the file needs to be 20MB or smaller in size.

Cygnus process

If your input file was uploaded successfully, Cygnus will go to work. Your job will be added to the back of the queue. When it's your turn, Cygnus will read your PBF file, and download the OSM data for the same extent. This is done using the Overpass API. It will then compare your upload with the existing OSM data, and produce the ouptut file. I will spend a separate post on more details about the Cygnus process. For now, the most important thing to remember is that Cygnus will only consider roads. This includes most highway ways in OSM. Any other data will be ignored.

When Cygnus is done processing your file, it will be available to download in the job queue.

queue

You can download and/or remove your file here. Everyone's jobs are visible here, so please be careful not to touch other users' stuff.

Inspect and process in JOSM

The downloaded file is plain JOSM XML:

josm-result

What you see here are the differences between OSM and your uploaded file. This includes new ways added, ways with changed geometries and ways with new tags added. Next we need to inspect the changes carefully against existing OSM data. Cygnus is set to conflate very conservatively by default. The results surely will need manual tweaking.

This is by far the most important, and time consuming, step!

So I load the OSM data in JOSM for the same extent. First, I use the layer panel to get a quick overview of what has been added or changed:

switch

Because there was already a highway=secondary, it is probably a good idea to pay close attention to the data there. While Cygnus does a best effort to connect ways where needed, it acts conservatively so it will not snap ways together that do not belong together.

Here are a few ways that got properly connected to the existing highway=secondary:

good-conflate

But here the distance was too far so Cygnus did not snap:

toofar

In this case, you would need to manually connect the ways if that is appropriate.

You can also inspect what Cygnus proposes by selecting any way of the Cygnus layer and looking for the telenav:graphenhancer tag. This will have the value new for added ways, and changed:geometry for ways that have geometry changes, for example.

(The quality of the result will not only depend on what Cygnus does. 'Garbage In, Garbage Out' also applies. So before you even offer your file to Cygnus for conflating, make sure you have triple / quadruple-checked your atribute translation tables and other pre-processing steps.)

When you are finally satisfied with your manually post-processed conflation result, you can go ahead and merge it with the OSM data in JOSM:

merge

When that is done, you probably want to remove the telenav:graphenhancer tags:

remove

remove-2

After that, the data should be close to ready to be uploaded to OSM. In this case, I am not going to do this because I did not follow import procedures at all, and I wrote a quick, simplified translation file for the attributes. So Los Morales is still just as sadly absent from OSM as it was before. But I hope this will give you an idea of what Cygnus is capable of.

If you have existing import plans that involve road network, and you would like to take Cygnus for a spin, please do. I am happy to help. Email me, ping me on twitter or Skype (mvexel). I am looking forward to hearing from you!

Comment from gileri on 24 November 2015 at 21:57

I don't really see the point of using a placeholder in OSM diaries, but I guess there is one as you posted.

Another thing : by using a placeholder, your update won't be pulblished on things like RSS and might go unseen by a lot of people.

Hide this comment

Comment from Super-Map on 25 November 2015 at 10:20

Nice work and rendering on this map: "f4map" however and unfortunately this map isn't fully "free" it's "copyrighted"

Hide this comment

Comment from Sanderd17 on 25 November 2015 at 18:39

Are those cranes randomly generated on construction area?

Hide this comment

Comment from gileri on 25 November 2015 at 20:05

@Sanderd17 Yes it seems so

Hide this comment

Comment from mvexel on 5 December 2015 at 03:58

======================================================== Comments above this line were from when there was a placeholder here...

Hide this comment

Comment from MikeN on 6 December 2015 at 12:28

This is a great tool! There have been talks of creating such a thing for 5 years; I never had the time though. Looking forward to testing it in early 2016 with a dataset I'll be working with!

Hide this comment

Comment from Glassman on 13 December 2015 at 00:38

Martijn, This tools has some great potential. I can't wait to try it out on some of my county data. Many of the surrounding counties publish new data monthly which means it is better than Census TIGER data.

I did a similar process using PostGIS to match by highway names. The problem is so many ways are unnamed. It looks like Cygnus solves this problem.

Hide this comment

Comment from mikelmaron on 13 December 2015 at 06:42

Very curious to read more about the process and code Cygnus applies behind the scenes, and how it can be extended to other kinds of features.

Hide this comment

Comment from Jorge Gustavo Rocha on 13 December 2015 at 16:12

Nice work, Martijn. I also curious to see how it is implemented. Is it open source?

Hide this comment

Comment from mvexel on 17 December 2015 at 21:09

Hi Jorge - no it's not open source (yet..). I am working to get to a point where we can do that. The engine relies on an internal, proprietary SDK. For now (to also answer Mikel's question) I plan to do a series of more in depth articles that explain in more detail what happens behind the scenes.

If you have a specific use case you would want help with, I would be happy to discuss it!

Hide this comment

Comment from mvexel on 17 December 2015 at 21:10

MikeN, Clifford - I will be looking forward to looking at your data together to see if Cygnus can help.

Hide this comment

Comment from MikeN on 9 January 2016 at 21:10

Ok, my first try: "Roads_glommedEdit.pbf" . Some notes -

  1. Requiring PBF for 'new data' is problematic, since the usual processing and JOSM pre-validation results in Osmosis refusing to convert to PBF on the grounds that a version is required. For now I lied and made up a version, etc for test purposes.

  2. The job result was that it claimed that there were no ways in the file. Yet I can convert back to XML and see the ways. Perhaps this was a result of lying about version, author, etc.

    1. My data is for a very rural county, and I hope I won't hit the 50 km limit - the PBF file was only 3.4 Megabytes.

I've got more near ends to join in JOSM before I consider it authoritative, but I wanted to give it a spin!

Hide this comment

Comment from MikeN on 9 January 2016 at 21:14

Oh yes, I see that all my node references were invalid. So I'll hold off until I hear how to create a PBF from a 'new file' with no OSM version history, etc.

Hide this comment

Comment from mvexel on 12 January 2016 at 17:45

Mike - I can help out with proper file conversion if you want. I realize PBF format can be tricky as an input file format. On the other hand it does help raise the barrier - this tool should be used with caution. Using PBF as input file format forces the user to think about the data in an OSM context.

Let me know how I can help.

Hide this comment

Comment from MikeN on 15 January 2016 at 12:44

I tested this now with 2 rural counties, and here's my impressions:

This concept will be a major time saver to be able to integrate OSM data from reference data as time goes on.

I populated my reference data (from latest county GIS) with a surface attribute, as well as an updated review against current imagery to identify roads that have deteriorated into tracks. The comparison did identify geometry changes and new roads. I think this is currently the strongest use case: merging new data into the existing OSM network.

It would be interesting to have more control over tag merging - such as taking surface tags from the 'new' ways if there is no current surface tag.

Hide this comment

Leave a comment

Parsed with Markdown

  • Headings

    # Heading
    ## Subheading

  • Unordered list

    * First item
    * Second item

  • Ordered list

    1. First item
    2. Second item

  • Link

    [Text](URL)
  • Image

    ![Alt text](URL)

Login to leave a comment