A Brief Dalliance with Imports

Posted by asciiphil on 24 May 2011 in English (English)

A lot of people are, if not opposed, at least strongly skeptical of imports. Last week, there were a few opinions on the subject, including one that offered, "Never trust robots," as a policy statement.

Naturally, this was the same week I found Baltimore City's Open Data Catalog, which is full of public domain data, some of which could be very useful to OpenStreetMap. I decided I wanted to try importing the landuse data, since that would liven up the city's OSM data and wouldn't, I thought, need too much work to integrate it into existing data. I figured I'd convert the shapefile to an OSM file, tag every landuse with some "unprocessed" tag, then go through the whole city and make sure any existing landuses were incorporated into the process, so no one's data would be lost or ignored. I planned on testing out this process in a few areas of the city to make sure it was feasable, emailing the talk-us list and the other people who had contributed to Baltimore mapping, and then proceeding if there weren't any objections.

I didn't get that far. The shapefile was such a mess relative to the topological quality of data I would expect from OpenStreetMap that I decided it wasn't worth the effort it would take to clean it up. There were tons of places with pointlessly overlapping landuses, others with overlaps that might or might not have been pointless, nodes that ought to be shared in OSM but which weren't quite close enough in the shapefile to have been merged during the preprocessing, and quite a lot of tiny slivers of areas an inch or less wide.

This more or less matches my experience with the National Hydrography Dataset. It's decent data for a lot of uses, but on its own, it doesn't match the quality of what can be put into OpenStreetMap. In the NHD, streams can be misaligned by ten meters or more, and an straight import wouldn't address the manner in which waterways interact with roads--either going under bridges or through pipes. When I'm mapping waterways, I use the NHD as an overlay for my state's aerial imagery to give me a rough idea about the directions and names of waterways, but I trace their alignment from the imagery, not the NHD.

Similarly, I'm working on making a rendered tileset that Baltimore mappers can use to see the city's landuse and then make their own judgements about how to bring that into OpenStreetMap. (In my use of that rendering in conjunction with aerial imagery, I can also tell that the city's data isn't always exact about property lines, although it's good enough to use as a pointer, at least. Just like the NHD.)

In summary, my experience has been that OpenStreetMap demands precision in ways that traditional GIS doesn't, particularly in the topology of the data, which is why raw GIS data is often a very poor fit for OSM without significant manual work to adapt it. That's a very large argument against most imports.

Of course, I'm not entirely dissuaded. The city also has a shapefile with block-by-block addressing. I still have visions of importing that data into OSM, because I think it would be tremendously useful. I don't expect it to be easy, though, and it might not be feasible at all.

Location: Little Italy, Harbor East, Baltimore, Maryland, 21203, United States of America

Comment from JoshD on 24 May 2011 at 13:08

There's certainly plenty of available data that is not of sufficient quality to be directly merged with OSM, but rather used as a guide like you've said. There is some high quality data though, and so it's worth looking at every dataset to make this determination, and to share those results with others. I can't seem to find any mention of Baltimore's data on the wiki page. Would you consider adding a section on their catalog, and add your comments about any particular datasets and their quality? Personally I always go straight to the wiki to see if anyone else has found data for a given county/city, and whether they've done anything with it. It would be a great help to others to document your work and save duplication of efforts (especially ones that lead to a dead end!). I've found that people don't document their work sufficiently, and I think the wiki is the single best place to put that information.

Hide this comment

Comment from asciiphil on 24 May 2011 at 13:39

I plan on updating the wiki, once I've got my tileset available for general use. (I'm currently working with my hosting company to see what parts of the rendering chain they're willing to install and which ones I need to set up and maintain myself.)

Hide this comment

Comment from Andy Allan on 24 May 2011 at 16:18

Glad to see someone treading carefully - and more importantly realising when it's not worth it! Sometimes I think when people have such high expectations of an import they aren't willing / able to back down when they realise it's not of the quality we aspire to.

Interesting to hear your thoughts on NHD, and I think it would really help to promote your methods of using the data, through documentation and blog posts like this.

Hide this comment

Comment from compdude on 25 May 2011 at 00:13

The TIGER import is definitely a great example of an import that does not meet the precision that OSM is capable of. But the TIGER import was definitely worth it, though--what would the map look like without it?

Hide this comment

Comment from Harry Wood on 25 May 2011 at 15:55

@JoshD Good suggestion. I've added a mention of it. Baltimore, Maryland#Data imports as well as rejigging that whole wiki page, to de-emphasise past event information, and invite new event organisers. That needs to happen on many wiki pages of course. Wish somebody who lives in the U.S. would do this stuff.

Hide this comment

Comment from ElliottPlack on 26 November 2012 at 04:36

Phil. Great intro to some of the issues with that data. I work for Baltimore County GIS, and have spoke at length with my colleagues downtown about importing some data to OSM. They confirmed with legal that the data is in public domain, but that they're planning to update OSM on their own. I haven't seen any such project yet, and have uploaded some data here and there to test. I'll let you know if I hear anything else.

Hide this comment

Leave a comment

Parsed with Markdown

  • Headings

    # Heading
    ## Subheading

  • Unordered list

    * First item
    * Second item

  • Ordered list

    1. First item
    2. Second item

  • Link

  • Image

    ![Alt text](URL)

Login to leave a comment