asciipip's Diary

On Tracing from Poor Imagery

Posted by asciipip on 27 June 2012 in English.

I came across this section of an Interstate today:

Notice how the Interstate looks a little like it’s been draped across the terrain’s ridges and valleys? That’s probably because the road was originally traced from non- (or not fully) orthorectified imagery. (From perusing the history of the way’s geometry, it looks like the culprit was someone in the US Census Bureau; the waviness was there in the TIGER import.)

I matched the geometry up to Maryland’s six-inch imagery (which is, in my experience, excellently aligned and rectified) with this result:

This post is just a reminder that you should be evaluating the qualities of the aerial imagery you use and not just blindly tracing from it.

Python Tile Expiration Class

Posted by asciipip on 10 June 2012 in English.

I have periodically needed a memory-efficient way to track tile expirations, so I wrote something to do it. Here, I make it available in case it can be useful to others.

The code is tileexpire; to use it, you just need to import the module, instantiate an OSMTileExpire object, call its expire() method for each expired tile, and then use its expiredAt() method for each zoom level you want to process.

It should be reasonably memory-efficient because it uses a quad tree that collapses it branches as they fill up, so it can handle a lot of tiles as long as they’re reasonably dense (in my experience, that’s a reasonable assumption for OSM tile expiration data).

It’s under a CC0 waiver, so you’re free to use it for anything you want, but if you make improvements, I’d love to hear about them.

Mapnik and loaded icons

Posted by asciipip on 10 April 2012 in English.

I’m posting mostly because the repetition will help me remember it, because I keep getting bitten by this.

Mapnik appears to cache images loaded by ShieldSymbolizers (and, presumably, also for LinePatternSymbolizers, PolygonSymbolizers, PointSymbolizers, and MarkersSymbolizers). This means that if you have a long-running rendering process (like Tirex or mod_tile’s renderd), that process uses an image in its rendering, and you then change the image file, Mapnik will continue rendering with the old image. You’ll have to reload the stylesheet in order to start using the new image.

Maryland and DC are ODbL-clean!

Posted by asciipip on 24 March 2012 in English.

Over the past several months, I’ve been working on replacing or accepting all of the data in Maryland marked as problematic by the OSM Inspector. As of today, I’m done (and I touched up DC, too, though others did most of the work there). I have looked at every problematic bit of data and either marked it clean (for mechanical edits like expanding name abbreviations), made it clean (by removing all data contributed by non-agreeing contributors), or deleted and replaced it (using only OSM-compatible data, of course). The only exceptions are standalong nodes with tags I couldn’t verify remotely that had been created by a non-agreeing contributor and never touched by anyone else. Those will probably be deleted when the license change is effected, but they won’t affect the surrounding data at all.

Next up is seeing what I can do to help clean up Virginia. The area next to DC is decidedly unhappy.

How I Map: TIGER Cleanup

Posted by asciipip on 16 February 2012 in English.

This is the first in what will probably be a very occasional series about how I do things in OpenStreetMap. In this post, I'll discuss how I improve the quality of TIGER-imported roads.

In general, I have at least a two-stage process for working on an area. In the first stage, I armchair-map: I use the USGS orthoimagery available for my state (which is 6-inch resolution and excellently rectified and georeferenced) to trace the roads and other major features in the area. I'll generally double-check that tracing with NAIP to make sure I'm not uploading old features. In the second stage, I use Walking Papers to get a printout of the area (usually in separate, page-sized chunks) and drive through it, verifying the road names.

What I'd like to talk about, though, are a couple things that are specific to how I work with TIGER-sourced ways and how they fit into my general workflow.

tiger:reviewed

There are many different approaches to this tag. I use it as a marker as to which stage of my editing a road is in. If its value is "no", either I haven't worked with the full length of the way or I did so before I settled on my current workflow. When I align the way to aerial imagery (using JOSM's excellent Improve Way Accuracy mode), I change the value to "position". Finally, when I've verified the name via ground survey, I remove the tag.

For roads that don't typically have names, like _link roads or service roads, I just delete the tiger:reviewed tag after aligning them to aerial imagery, unless TIGER appears to have given a name to the road anyway. I see the last case most often with personal named driveways.

I've written a couple of things to assist my particular use of the tiger:reviewed tag. For JOSM editing, I wrote a style that highlights "tiger:reviewed=position" in a light green, similar to the way "tiger:reviewed=no" ways are highlighted in yellow. I've made a screenshot of the highlighting and the CSS is in TIGER-aligned.css.

I also have a mapnik stylesheet that makes color-coded overlays for my personal rendering so i can see an area's status at a glance. I use a yellow overlay for tiger:reviewed=no, a green overlay for tiger:reviewed=position, and a red overlay for roads with no tiger:reviewed tag and no name tag. The stylesheet is TIGER.xml and it needs include files dbsettings.inc, extents-us.inc, and mapnik-utils.inc. It's really specific to my rendering, but here's a sample rendering with the overlay composited onto it that shows some of each case.

TIGER 2011

As I work through an area tracing things from aerial imagery, I often find roads that are new enough that they weren't in the TIGER 2005 data. After I've traced such a road, I turn on Ian Dees' TIGER 2011 road tiles to see if I can get the name from there. If I can, I tag the way with "source:name=TIGER 2011" and "tiger:reviewed=position", since I consider any TIGER-sourced data as needing ground verification.

Road Classification

As I'm working on an area, I also change road classifications as needed. In my experience, TIGER data seems to underclassify roads a lot (which is probably better as a default than classifying roads too highly). In my area, contributors before me have generally done a good job (by which I mean I don't usually disagree strongly enough to alter someone else's contribution) assigning classifications from secondary and up. I do a lot of conversions from residential to unclassified or tertiary, though.

Changing a road to unclassified is generally an easy decision. If it's a minor road that's not in a residential area, I make it unclassified. I also use unclassified for roads that are connectors in more rural areas but that I don't feel warrant tertiary classification.

Changing a road to tertiary classification is more of a judgement call. I'll usually do it if: the road has a divider painted down the center, indicating that it's expected to get a fair amount of traffic; the road does not have traffic calming features like speed humps and islands, since those generally indicate that a community wants to limit the amount of through traffic; and there isn't a better road that people should be using instead. (I have to thank mdroads for sharing with me his classification rules of thumb, which I adopted after I saw how well they fit the way I thought about the roads I knew best.) In urban and suburban areas, I also try to keep the ends of tertiary-classified roads on other roads of the same or higher classification, so as to emphasize the road network.

Conclusion

I hope I've been informative with this post. I welcome any comments, questions, or other feedback.

Downloading the USGS's Data

Posted by asciipip on 8 July 2011 in English. Last updated on 13 June 2012.

The US Geological Survey has a lot of data, some of which is available via WMS, and some of which is not. They'll give you any of it for free if you ship them a hard drive to put it on, but that can be both inconvenient (waiting weeks for the drive's round trip) and imprecise (they will only send entire datasets, even if you're only interested in a small portion of one). You can also download data from their seamless server, but downloading is a pain: you have to select an area (which must contain less than 1.5 GB of data in all), then their servers will divide that data into sub-250MB chunks and put the chunks on a webserver with randomly-generated filenames which will be deleted after an hour.

Fortunately, the USGS has recently implemented a set of web services that let you write programs to download chunks of data from them.

I've written a simple python program, get_usgs_data, to download all data for a dataset within a given region.

The program requires wget and the Suds and BeautifulSoup python modules. To use it, run get_usgs_data --bbox=left,bottom,right,top product, where left, right, top, and bottom are bounding coordinates in WGS84 and product is a valid USGS product ID. You can query for product IDs with the online index service. Look for one of the "return_Attributes" calls; the string you need is in the PRODUCTKEY attribute.

Addendum:

More information on getting the right product key.

The return_Attribute_List SOAP call lists the informational fields available for each dataset. It takes no parameters, so just click through to its service page and click the "Invoke" button. I usually use a combination of AREA_NAME, RESOLUTION, TYPE, and of course PRODUCTKEY. Basically, you want enough information to tell which product key corresponds to the dataset you want.

Next, go to one of the return_Attributes calls. The plain return_Attributes call will search all available datasets. The ones ending with _Elevation, _LandCover, and _Ortho search specific types of datasets. (The _EO appears to be similar, but I don't know what it stands for.) On the service page for the call you want, enter the attributes you want to see in a comma delimited list (e.g. "AREA_NAME,RESOLUTION,TYPE,PRODUCTKEY") and the rough bounds for the area you're interested in. (I'm often lazy and just use -180,0,0,90.) You also have to enter the EPSG code for the coordinate system you're using. Lat/lon coordinates are 4326.

After that, just look through the returned XML and figure out what the product key you want is.

Efficient Tile Autogeneration with S3 caching

Posted by asciipip on 30 May 2011 in English.

I'm happy with some of the tweaks I made to my tile generation setup. I mentioned it on IRC and one person asked for a writeup (not that it's all that complicated), so here it is.

I really like TileStache for tile generation (plus it's a nice tool for turning WMS and ArcGIS REST interfaces into tiles, too). It does on-demand tile rendering with a selection of caching strategies, including caching to S3.

The nice thing about S3 is that it's cheaper, both storage- and bandwidthwise than my webhost. The problem I ran into is that TileStache's caching, for flexibility reasons, doesn't know anything about the particulars of each cache mechanism. It just knows that if a tile is cached, it reads it out of the cache and serves it to the client, otherwise it renders the tile and caches it. That ends up double-billing me for each cached tile: once from Amazon when the tile is read out of S3 and again from my webhost when the tile is sent to the client.

All I did was add a check for the tile on S3 before calling TileStache and, if it's already cached, send the client a redirect to fetch it from S3 directly. The script is on pastebin.

I also hacked my copy of TileStache to use reduced redundancy storage for cached tiles. If they get lost, they can just be rerendered, and it's cheaper this way.

Incidentally, I really like my webhost. If you're looking for cheap, pay-for-what-you-use hosting for a small to medium site, you could do worse than NearlyFreeSpeech.NET.

Baltimore City Landuse Map

Posted by asciipip on 29 May 2011 in English.

After my bad experience with attempting to import Baltimore City landuse data (see previous diary entry), I took another tack and created a map rendering that can be traced from within an editor. I've now got that rendering available at http://aperiodic.net/osm/baltimore.html and documented at http://wiki.openstreetmap.org/wiki/Baltimore,_Maryland/Landuse.

Any suggestions on rendering improvements or other clarity-enhancing changes would be appreciated.

A Brief Dalliance with Imports

Posted by asciipip on 24 May 2011 in English.

A lot of people are, if not opposed, at least strongly skeptical of imports. Last week, there were a few opinions on the subject, including one that offered, "Never trust robots," as a policy statement.

Naturally, this was the same week I found Baltimore City's Open Data Catalog, which is full of public domain data, some of which could be very useful to OpenStreetMap. I decided I wanted to try importing the landuse data, since that would liven up the city's OSM data and wouldn't, I thought, need too much work to integrate it into existing data. I figured I'd convert the shapefile to an OSM file, tag every landuse with some "unprocessed" tag, then go through the whole city and make sure any existing landuses were incorporated into the process, so no one's data would be lost or ignored. I planned on testing out this process in a few areas of the city to make sure it was feasable, emailing the talk-us list and the other people who had contributed to Baltimore mapping, and then proceeding if there weren't any objections.

I didn't get that far. The shapefile was such a mess relative to the topological quality of data I would expect from OpenStreetMap that I decided it wasn't worth the effort it would take to clean it up. There were tons of places with pointlessly overlapping landuses, others with overlaps that might or might not have been pointless, nodes that ought to be shared in OSM but which weren't quite close enough in the shapefile to have been merged during the preprocessing, and quite a lot of tiny slivers of areas an inch or less wide.

This more or less matches my experience with the National Hydrography Dataset. It's decent data for a lot of uses, but on its own, it doesn't match the quality of what can be put into OpenStreetMap. In the NHD, streams can be misaligned by ten meters or more, and an straight import wouldn't address the manner in which waterways interact with roads--either going under bridges or through pipes. When I'm mapping waterways, I use the NHD as an overlay for my state's aerial imagery to give me a rough idea about the directions and names of waterways, but I trace their alignment from the imagery, not the NHD.

Similarly, I'm working on making a rendered tileset that Baltimore mappers can use to see the city's landuse and then make their own judgements about how to bring that into OpenStreetMap. (In my use of that rendering in conjunction with aerial imagery, I can also tell that the city's data isn't always exact about property lines, although it's good enough to use as a pointer, at least. Just like the NHD.)

In summary, my experience has been that OpenStreetMap demands precision in ways that traditional GIS doesn't, particularly in the topology of the data, which is why raw GIS data is often a very poor fit for OSM without significant manual work to adapt it. That's a very large argument against most imports.

Of course, I'm not entirely dissuaded. The city also has a shapefile with block-by-block addressing. I still have visions of importing that data into OSM, because I think it would be tremendously useful. I don't expect it to be easy, though, and it might not be feasible at all.

Location: Inner Harbor, Baltimore, Maryland, United States

Updated Aerial Imagery for Maryland in the Works

Posted by asciipip on 20 May 2011 in English.

Back in 2006, several state, federal, and private groups formed the Statewide Ortho Imagery Partnership, which funded the acquisition of high-resolution (6 inches per pixel) orthoimagery. Partly because the US Geological Survey was one of the funding partners, the imagery was put into the public domain. The images were recorded between fall 2007 and spring 2008, so most of it is leaf-off. On top of that, the georectification seems excellent. In every place where I have multiple GPS traces to consult, the imagery matches up extremely well with the traces, and there seems to be very little distortion in the images. I haven't found any documentation of error margins for the imagery, but I'd be surprised if the error were greater than a meter.

My mapping has benefited enormously from the availability of this imagery. The rectification means that I can largely rely on the images for precise feature alignment and focus my surveying on things not visible in the imagery like addresses and names of things. The completeness of coverage has allowed me to map things that are either difficult to survey in person (like streams that run through remote areas or private property) or impossible (like much of the high voltage power line network in the state).

As time passes, however, the imagery becomes outdated. I've added stuff to OSM that's only in NAIP imagery (which is lower-resolution and not at well rectified) or that I've had to align solely from GPS traces or correlation to older features. I know there are power lines that have been added since the imagery was flown (I've sporadically watched a extension to the substation near Reisterstown as it's been constructed). That's why I was happy to have found a press release from Axis Geospatial, the company that provided the 2007/2008 imagery, saying that they reimaged the Eastern Shore in 2010 and will be redoing the rest of the state in 2011. Based on what I can tell about the availability schedule from the last dataset, it will probably be 2012 before all the updated data is available in some form, and it may take longer to get it onto the USGS's WMS, which is what I use for my mapping. Still, I'm quite excited about the prospect of updated data.

Minutiae about Tile Rendering Queues

Posted by asciipip on 20 April 2011 in English.

I do my own map rendering, primarily to preload my phone's mapping program with offline tiles, but also to have a rendering that highlights things of interest to me for my region. My rendering is based on TopOSM, but with a lot of personal hacks and changes (the more general of which I've contributed back).

Until recently, my update process was very manual: run scripts to render all tiles for my rendering areas (which include overlapping bboxes of different shapes), run another script to combine the tiles for the three TopOSM layers (color-relief, contours, features) into single tiles for my phone, run a third script to use the expire files from my automated minutely mapnik setup to delete any tiles with expired data, run the first script again to rerender any deleted files, run the second script again to recombine any changed files, etc., etc. I wanted something that I could set up and leave running. As far as I could tell, the other render daemons in use (Tirex and mod_tile's renderd) just have a one-to-one correspondence between mapnik stylesheets and rendered tilesets. TopOSM is a little more complex (but oh, so much prettier). So I assembled my own.

The core of my change was externalizing my render queue. Rather than having a python script create a Queue object, fill it with tiles to be rendered, and spinning off render threads to consume the queue, I set up RabbitMQ to manage the work queues. That lets me feed the queues from one process and consume them from completely separate processes. The bit that I'm proud of has been my queue setup, so that's what I'm going to talk about.

When osm2pgsql runs, I have it generate tile expiration files. Those files are simply lines of the form "z/x/y", where "z" is a zoom level specified on the command line and "x" and "y" describe tiles with expired data (more or less; the process of associating data to tiles is a little fuzzy and osm2pgsql tries to err on the side of expiring more that it needs). Those get fed directly into a queue named "expire".

I have a single process consuming that queue. It takes each line, checks to see if I already have tiles rendered at any zoom level that cover that tile, then checks to see if the metatiles that contain those tiles have been submitted for rendering yet. The latter is done via a lookup in a PostgreSQL database. If there are tiles that need to be rendered, it marks the metatiles as submitted in the database and sends a message of the form "{layer1,layer2,layer3}/z/x/y" to one of the render queues. There is a separate render queue for each zoom level. For example, it recently fired off the message "{color-relief,features}/13/2442/3024" to queue render_13.

The rendering queue consumption is a little different. I want to prefer to get work from longer queues, since they have more work to be done, but I also want to prefer higher zoom levels, since the lower the zoom level, the more likely it is to be put back in the queue after rendering. The process I came up with is this: For each queue, multiply the current queue length by the total number of tiles at that queue's zoom level (4^z) and then divide by the number of tiles in a metatile for that zoom level, then normalize each number as a percentage of the total. Next, take the number of metatiles I've rendered for each zoom level and normalize those to percentages, too. Finally, pick the zoom level which has the greatest difference between queue percentage and rendered percentage and get the next work unit from that queue. The rendering process is multithreaded; each thread keeps its own count of rendered metatiles.

After rendering is finished, the records of submission in the database for those metatiles are removed and a message of the form "z/x/y" is sent to the "combine" queue (after checking to make sure it isn't already pending). A third process consumes the combine queue and combines the different layers into single tiles.

I have separate functions that I use for adding new tiles to be rendered. They are given a bbox and a zoom level range, check for tiles that are missing from that volume, and submit the appropriate metatiles for rendering. Once the tiles have been rendered for the first time, the expiry process will notice if their data is updated and will then submit them for rerendering. I used those functions to bootstrap my new rendering process, since I'd previously been keeping track of which tiles needed rerendering by simply deleting them.

All of the queues are specified as durable, and all of the messages are sent as persistent, so even if the RabbitMQ node shuts down or my computer reboots I won't lose track of the work that needs to be done.

I've been running this setup for several days now and it seems able to keep up with the data expiration rate. One optimization I'm considering would be to track which individual tiles have been expired and only extract those from the metatiles. That would probably speed up some of the higher zoom levels; a zoom 16 metatile is a 12x12 block of regular tiles, and it takes a noticeable amount of time to extract and write all 144 constituent tiles. Another idea I had was to keep track of how long it takes to render metatiles at each zoom level and factor that into the decision about what zoom level to do next, preferring zoom levels that go faster. I plan to leave this setup running for a week or so to get a baseline before I try any alterations.

One Year of OpenStreetMap

Posted by asciipip on 16 March 2011 in English.

Today is the one year anniversary of when I joined OpenStreetMap, so I figured I'd look back at some of the large-scale work I've done in Maryland.

I started out not even knowing about OpenStreetMap; I just wanted a program for my phone (a first generation Palm Pre) that would track my travels and draw pretty lines on a map. I found MapTool, which used OpenStreetMap tiles. At some point I figured out that I could edit the map it was showing and started to do so, beginning with the areas I knew well around my workplace and home.

My first large edits were to the county boundaries in Maryland. I noticed that they were in the database but weren't being rendered, which I realized was because they had no admin_level tag. Because I'm a completionist, rather than fix just the boundary that bugged me initially, I converted all the county boundaries into multipolygons with shared ways and made sure each way had an admin_level.

At some point, I found that the USGS had six inch resolution imagery for all of Maryland, even the really remote counties. That gave me the ability to "see" into places I couldn't walk into and places that were too far to visit and revisit in multiple surveying trips. In particular, the resolution was good enough to trace individual power lines through a messy substation that was very much access=no to me. I decided to try to map all the power lines in Maryland. After several months of work with the aerial imagery, I got the map to the point where every power line in Maryland (as of the 2007/2008 imagery) was mapped, and any data from the TIGER import that did not represent an existing power line was removed. (Or just reduced to a minor line. A lot of sub-100kV lines were tagged as power-line via TIGER.)

Here are the before and after renderings:

Obviously, people have been busy in other states, but every line in Maryland was my handiwork.

The next major task I completed was fixing the administrative boundaries in Maryland. The TIGER data had two problems: it was good for low zoom views of the boundaries but could be off by hundreds of meters in places, which became apparent at high zoom levels; and it didn't include the Potomac River or the Chesapeake Bay. I went through and realigned every county boundary in the state, as well as the state border. In many cases, the counties are separated by waterways; for those, I simply traced the waterways from the USGS aerial imagery and then tagged the ways as both waterways and administrative boundaries. For the rest of the boundaries, I traced from old USGS topo maps.

Here are before and after renderings for the whole state and a couple of detail sections to illustrate the inaccuracies in the TIGER data:

Note that one of the things I did as I worked on the county boundaries was to take large rivers that were imported as coastline and retag them as riverbank. The old data that I used to render the "before" images doesn't have those riverbanks, obviously, but I used the current coastline data, since there are no historical downloads for that (that I know of, at least), so the old renderings don't have those regions as coastline either.

My most recent project has been the railroads in Maryland. Like the other things, it started with a local annoyance that grew into a statewide project. As I was tracing the rails for the above-ground portion of the Baltimore Metro, I decided to take care of the railroad that they run next to. Some portions of that railroad were tagged as name="CSX Transportation", while others were tagged name="Western Maryland RR". It seemed to me that those were the names of the companies that owned the railroad, not the names of the railroads themselves. I started researching things and found that entire section of track was owned by CSX Transportation (which was formed by a merger that included a company into which the Western Maryland Railroad had previously been merged) and they called it the Hanover Subdivision. I decided to apply the same level of research to all the rest of the railroads in Maryland. I'm not quite done with that. I've aligned and tagged everything in Maryland with two exceptions: the southern portion of Harford County is missing its USGS imagery and the Bing imagery isn't quite good enough to follow railroads unless you're lucky, so I haven't cleaned up the rails in Aberdeen Proving Grounds or the apparently abandoned cluster of rails near Edgewood. The other exception is the Patapsco and Back Rivers Railroad, which operates in the old Bethlehem Steel plant at Sparrows Point. It's actually kind of fascinating that they have their own, separate railroad company, but a) there's a lot of rail there, so cleaning up TIGER is a lot of detailed work, and b) I started on it once, got about a third of the way done, and JOSM both crashed and corrupted its autosave file. I'll get back to it once the pain of losing all that work has sufficiently faded. Otherwise, I've cleaned up all the active railroads in Maryland and either deleted or retagged TIGER ways for now-disused railroads.

(I'll note that my criteria for disused versus abandoned versus deleting the ways is this: if the tracks are still mostly there, tag railway=disused; if the tracks are mostly gone (with maybe occasional short segments remaining) but the path of the railroad is still discernible (via topography or property lines, usually), tag railway=abandoned; otherwise, if there's no evidence that there was ever a railroad there (the land has been level and a housing subdivision built, for instance), just delete the way.)

Here's the before and after for Maryland railways, color-coded according to carrier (more or less; click through to the second one for the key):

I like the fact that the topography difference between central and western Maryland and the Eastern Shore is very obvious in the degree of straightness in the railroads' paths.

Selective Tile Expiration

Posted by asciipip on 28 November 2010 in English.

In my experimentation with tile rendering, I often want to rerender only a subset of the tiles that I've previously rendered, usually because I've changed parameters just for secondary roads or something similar.

To facilitate that, I wrote a script: expire-query. It takes an SQL query (or queries) as a parameter, runs the query, and lists the tiles covered by the results of the query in the same format that osm2pgsql does (so any tools that operate on those files will work with this script, too).

To run it, you'll need Ruby, Rubi DBI, and the PostgreSQL Ruby DBI driver (which is included in the Ruby DBI distribution).

The --help option should cover any usage questions.

Rendering Route Shields from Route Relations in Mapnik

Posted by asciipip on 13 October 2010 in English. Last updated on 14 October 2010.

[For some reason I can't get the command and config excerpts below to look like I want, which is basically what I'd get if I put them in a <pre> block. I've done the best I can.]

I've been getting back to map rendering, and one of the things that bothers me about pretty much everyone's rendering is that they either don't render US route shields or they go through hackery (in my opinion) to turn road ref tags into a shield plus a number.

I've wanted for a while to use the data in route relations for shield rendering. At least in the US, they use separate tags for network and route number, which makes picking the right shield easier. They also handle the case of a single road belonging to multiple routes better than the road's ref tag does.

The drawbacks are in the way that osm2pgsql puts relations into the database. When a way is a road and a member of a route relation, osm2pgsql creates one row with the road's tags and a separate row with the same path as the road but the tags from the relation. As far as I can tell, there's no simple link between them. This means that you can't use the road's highway key to determine visibility of the shield, which means, among other things, that you have to possibility of a shield being rendered at a lower zoom level than its road.

If the data was imported in slim mode, the planet_osm_rels table does contain a link between a road and its relations. I tried using it directly, but that proved to be too slow for everyday use. I ended up creating another table and running queries off of that. The rest of this diary is about what I did.

My specific requirements were: if a road is a member of a correctly-tagged route relation, use the appropriate shield with the relation's ref tag. If a road is a member of an incorrectly-tagged route relation, show the relation's ref tag on a generic shield. If the road is not a member of any route relations but has its own ref tag, show that on a generic shield. Highway names are not relevant to this rendering; they can be rendered from the planet_osm_line table directly. A nice additional feature would be if contiguous ways with the same route membership were aggregated before Mapnik saw them so the shields were evenly spaced along the way. (The normal rendering tends to bunch up shields (and names) along short ways like bridges and tunnels.)

With that in mind, I made a table with the absolute minimum amount of information necessary:

CREATE TABLE planet_osm_route (
  highway TEXT,
  road_ref TEXT,
  network TEXT,
  route_ref TEXT,
  way GEOMETRY);

Next was populating it with data from the osm2pgsql import. The following statement takes all the roads that are in route relations, cross joins each road with each relation it's in, groups them together by route and original road ref, then groups them again just by route. The reason for the double grouping is that I throw all the original ref tags into the road_ref field and I'd like the minimize the duplication there. I use a chain of PostGIS functions to group all the road LINESTRINGs into a MULTILINESTRING, merge contiguous lines together within the MULTILINESTRING, and then split the resulting geometry up into separate LINESTRINGs again. This is somewhat resource intensive, but I really like the results.

INSERT INTO planet_osm_route
  SELECT highway, string_agg(road_ref, ';') AS road_ref, network, route_ref,
         (ST_Dump(ST_LineMerge(ST_CollectionExtract(ST_Collect(way),2)))).geom AS way
    FROM (SELECT road.highway, road.ref AS road_ref,
             route.network, route.ref AS route_ref, route.osm_id,
             ST_Collect(road.way) AS way
            FROM planet_osm_line AS road
             JOIN (SELECT id, unnest(parts) AS part
             FROM planet_osm_rels) AS rels
             ON road.osm_id = rels.part
             JOIN (SELECT DISTINCT ref, osm_id, network FROM planet_osm_line) AS route
             ON -rels.id = route.osm_id
            WHERE road.highway IS NOT NULL
             AND route.ref IS NOT NULL AND route.network IS NOT NULL
            GROUP BY route.osm_id, road.highway, road.ref, route.network,
             route.ref) AS route_by_ref
    GROUP BY osm_id, highway, network, route_ref
    ORDER BY way;

That took about four hours for a US extract on my system, despite a bunch of things that were running and maxing out the disks to begin with.

The next step is to add in the roads that aren't in route relations. I did the same dance with the ways to get them as combined as possible.

INSERT INTO planet_osm_route
  SELECT road.highway, road.ref AS road_ref,
         route.network, route.ref AS route_ref,
         (ST_Dump(ST_LineMerge(ST_Collect(road.way)))).geom AS way
    FROM planet_osm_line AS road
      LEFT JOIN (SELECT id, unnest(parts) AS part
             FROM planet_osm_rels) AS rels
        ON road.osm_id = rels.part
      LEFT JOIN planet_osm_line route
        ON -rels.id = route.osm_id
    WHERE road.highway IS NOT NULL
          AND road.ref IS NOT NULL
          AND (route.ref IS NULL OR route.network IS NULL)
    GROUP BY road.highway, road.ref, route.network, route.ref
    ORDER BY way;

That took about half an hour on my system.

Finally, I added the necessary index on the table. (I did this last at the recommendation of pretty much every PostgreSQL document that mentions bulk imports.)

CREATE INDEX planet_osm_route_index ON planet_osm_route USING GIST (way GIST_GEOMETRY_OPS)

That took about an hour.

Those are really the hard bits. Setting up the rendering was easier.

For the shields, I grabbed blank Interstate, US, and generic state shields from Wikipedia (which has them under a public domain license) and rendered the SVGs to PNGs at a height of 24 pixels. At that height, the numbers for interstates should be 10 points tall and the US highways should be 12 points tall.

The Mapnik stylesheets should be pretty obvious from here, but I'll give an overview of them anyway. For clarity's sake, I'll cut out the rules for selective rendering based on highway type, and some of the more repetitive bits.

<Style name="interstateshields">
    <Rule>
        <Filter>[route_len] = 1 or [route_len] = 2</Filter>
        <ShieldSymbolizer name="route_ref" face_name="DejaVu Sans Bold" size="10"
             fill="white" placement="line"
             file="symbols/I-blank.png"
             type="png" min_distance="45" spacing="750" />
    </Rule>
    <Rule>
        <Filter>[route_len] = 3</Filter>
        <ShieldSymbolizer name="route_ref" face_name="DejaVu Sans Bold" size="10"
             fill="white" placement="line"
             file="symbols/I-blank_wide.png"
             type="png" min_distance="45" spacing="750" />
    </Rule>
    <Rule>
        <Filter>[route_len] > 3</Filter>
        <ShieldSymbolizer name="route_ref" face_name="DejaVu Sans Bold" size="10"
             fill="white" placement="line"
             file="symbols/shield-black-8.png"
             type="png" min_distance="45" spacing="750" />
    </Rule>
</Style>

<Style name="usshields">
    <Rule>
        <Filter>[route_len] = 1 or [route_len] = 2</Filter>
        <ShieldSymbolizer name="route_ref" face_name="DejaVu Sans Bold" size="12"
             fill="black" placement="line"
             file="symbols/US_blank.png"
             type="png" min_distance="45" spacing="750" />
    </Rule>
    <Rule>
        <Filter>[route_len] = 3</Filter>
        <ShieldSymbolizer name="route_ref" face_name="DejaVu Sans Bold" size="12"
             fill="black" placement="line"
             file="symbols/US_blank_wide.png"
             type="png" min_distance="45" spacing="750" />
    </Rule>
    <Rule>
        <Filter>[route_len] > 3</Filter>
        <ShieldSymbolizer name="route_ref" face_name="DejaVu Sans Bold" size="10"
             fill="white" placement="line"
             file="symbols/shield-black-8.png"
             type="png" min_distance="45" spacing="750" />
    </Rule>
</Style>

<Style name="stateshields">
    <Rule>
        <Filter>[route_len] = 1 or [route_len] = 2</Filter>
        <ShieldSymbolizer name="route_ref" face_name="DejaVu Sans Bold" size="12"
             fill="black" placement="line"
             file="symbols/State_blank.png"
             type="png" min_distance="45" spacing="750" />
    </Rule>
    <Rule>
        <Filter>[route_len] = 3</Filter>
        <ShieldSymbolizer name="route_ref" face_name="DejaVu Sans Bold" size="10"
             fill="black" placement="line"
             file="symbols/State_blank.png"
             type="png" min_distance="45" spacing="750" />
    </Rule>
    <Rule>
        <Filter>[route_len] > 3</Filter>
        <ShieldSymbolizer name="route_ref" face_name="DejaVu Sans Bold" size="10"
             fill="white" placement="line"
             file="symbols/shield-black-8.png"
             type="png" min_distance="45" spacing="750" />
    </Rule>
</Style>

</Style>

<Layer name="interstateshields" status="on">
    <StyleName>interstateshields</StyleName>
    <Datasource>
        &dbsettings;
        &extents;
        <Parameter name="table">
            (select highway, route_ref, length(route_ref) AS route_len, way
            from planet_osm_route
            where highway IS NOT NULL and network = 'US:I'
            ) as interstateshields
        </Parameter>
    </Datasource>
</Layer>

<Layer name="usshields" status="on">
    <StyleName>usshields</StyleName>
    <Datasource>
        &dbsettings;
        &extents;
        <Parameter name="table">
            (select highway, route_ref, length(route_ref) AS route_len, way
            from planet_osm_route
            where highway IS NOT NULL and network IN ('US:US', 'US')
            ) as usshields
        </Parameter>
    </Datasource>
</Layer>

<Layer name="stateshields" status="on">
    <StyleName>stateshields</StyleName>
    <Datasource>
        &dbsettings;
        &extents;
        
        <Parameter name="table">
            (select highway, route_ref, length(route_ref) AS route_len, way
            from planet_osm_route
            where highway IS NOT NULL and network IN ('US:MD', 'US:PA', 'US:DE', 'US:VA', 'US:WV')
            ) as stateshields
        </Parameter>
    </Datasource>
</Layer>

<Layer name="nonetworkroads" status="on">
    <StyleName>nonetworkroads</StyleName>
    <Datasource>
        &dbsettings;
        &extents;
        <Parameter name="table">
            (select highway, road_ref, length(road_ref) AS road_len, way
            from planet_osm_route
            where highway IS NOT NULL and network IS NULL
            ) as nonetworkroads
        </Parameter>
    </Datasource>
</Layer>

The rendered result has been everything I've wanted in shield rendering, and all i need to do is spend about six hours dumping and reloading the table every time I do a batch of updates to my database.

I hope other people will find some utility in the work I've done.

Edit: By request, a screenshot that should illustrate several of the different classes of routes I'm mapping.

This rendering has I-70 (an Interstate with a route relation), US 40 (a US highway with a route relation), US 29 (a US highway with a route relation whose ref tag is incorrect), MD 108 (a state road with a route relation), and MD 103 and MD 104 (state roads without route relations but with ref tags on the roads). MD 100 (a state road without a route relation) is also in the rendering, but Mapnik puts its closest shield just off the image to the lower right.

Project Completion!

Posted by asciipip on 5 October 2010 in English.

Several months back, I was pleased to find the USGS's high resolution orthophotography; it's public domain, so it can be used in OSM, and almost^* the entire state of Maryland is available in 6" (15 cm) resolution. As I was touching up some power lines near where I work, I realized that the resolution was high enough that I could follow individual lines, which let me sort out some things that I couldn't see well enough in the Yahoo! imagery and couldn't get close enough to the lines to see in person.

I decided to set a goal of mapping all of the power lines in the state (more precisely, to edge of the state's aerial photography, which extends a small amount into neighboring states), including cables, wires, and voltages. (The voltages came from some state documents available online.)

As of this evening, I'm done. Every major power line (>100kV) in the 2007 aerial photography is mapped, and every power line from TIGER is either verified to exist, retagged as a minor_line, or deleted (if nothing actually exists where TIGER had the lines).

I think my next project will be the railroads in Maryland. TIGER's not bad with them, but I've been adding all the tracks when there's more than one and giving them appropriate names. (TIGER mostly uses the railroad operators, and often past operators, in the name fields; I've been putting those into the operator tag and putting the lines' actual names into the name tags.)

^*The lower half of Harford County is missing. I can only assume this is an oversight, since the entire rest of the state, including the sparsely-populated Garrett and Alleghany Counties, is present, plus the layer for the northern half of the county is named "MD_HarfordCounty_0.5ft_Color_Feb_2008_01", but there's no "..._02" layer.

SRTM and NED elevation data

Posted by asciipip on 15 June 2010 in English.

In the comments on my previous diary entry, people pointed out that the CGIAR data has a no-commercial-use restriction on it, which would be incompatible with OSM's database license if I wanted to derive anything from it. I was only planning on using it for display purposes, but I decided to look into other sources of elevation data anyway. I found the National Elevation Dataset from the US Geological Survey.

What's nice is that there are actually three NED datasets: NED1, NED3, and NED9. NED1 has a resolution of one arc-second, the same as SRTM1. NED3, however, is 1/3 of an arc-second, and NED9 is 1/9 of an arc-second, which ends up being about 3 meters. NED9 isn't available everywhere, so I decided to use NED3 and NED9 together.

A very illustrative test rendering is the Marriottsville Quarry in Howard County. For reference, here's what it looks like with the SRTM data:

It's not bad, but also not very precise. In particular, the dam in the upper lefthand corner extends well beyond its actual location.

Let's compare it to the NED3/NED9 rendering. (One of the reasons this makes a good test region is that it's right at the edge of one area of NED9 coverage.)

It's obviously more precise, but it seems almost too precise. I really don't like the look of the sharp edges, and those sharp edges are everywhere. (There's also some unusual stuff that Liberty Lake is hiding; bodies of water are not always flat in NED9 and introduce some really weird hillshading artifacts.) Furthermore, there are obvious edges where the NED9 data ends.

I tried rendering just the NED3 data (which is still about nine times more precise than SRTM1), but it seemed less accurate: it didn't show the quarry pit at all, though it showed a little of the hill next to it. (I've been by there, and the pit really does exist.) Given that, I really wanted to use the NED9 data where I could.

I then tried using an interesting denoising algorithm I read about online, but it used up all 4 GB of my RAM on a 0.5 square degree chunk of NED3+NED9 data, so I gave up on it. I tried using ImageMagick to apply a blur filter, but the NED data is made up of 32-bit floating-point numbers, and ImageMagick only seems to like TIFFs if each pixel is eight bits wide (or is a 24-bit triplet of 8-bit integers). I ended up just using gdalwarp to reduce the resolution on the NED9 data down to 1/3 of an arc-second. That effectively blurred the elevations, and I'm reasonably pleased with the result, even if the edges of the NED9 data are still visible:

Note that the contour lines in all rendering are SRTM1-based. It's a lot harder for me to swap out contour lines than it is color relief and hillshading data.

Duplicate Entry; Ignore

Posted by asciipip on 18 May 2010 in English.

I accidentally posted the previous diary entry twice. Ignore this one.

Further Adventures in Hillshading

Posted by asciipip on 18 May 2010 in English.

At the suggestion of a comment on my last diary entry, I decided to experiment with putting the hillshading information in the alpha channel of an overlay for the rest of the map, more or less as described in http://wiki.openstreetmap.org/wiki/Hillshading_using_the_Alpha_Channel_of_an_Image .

I also tweaked some of the other settings. I toned down the relief colors. There's still a visual distinction between various elevations, but the background colors don't interfere as much with the foreground colors. I also reorganized some of the rendering layers. I'm rendering things in this order: color relief, landarea/landuse, contour lines, water bodies. I think this ordering gives a nice indication of both elevation and landuse without either interfering with the other.

Finally, I found the scaling=bilinear8 option for Mapnik's RasterSymbolizer, which gives a lot smoother hillshading; the shading now looks good down to zoom 16 or so, and passable all the way down to zoom 18.

Here's one of the rendering areas from the last diary entry under the new rendering rules, both with and without the hillshading:

I also like this rendering of Hoyle Crest, the highest point in Maryland:

I still have some problems I need to work out. In some areas, the hillshading pretty much obscures the map below it, as in this area along the Virginia/North Carolina border:

I'm considering making an additional overlay that just has place labels (and possibly road shields) and putting that overlay on top of the hillshading.

I also need to figure out a better way of getting coastline data into my rendering. The standard approach is to make the map background the color of water and then draw the polygons from the coastline shapefiles as the color of land. I have to use the color relief image as my background, so the standard approach won't work for me. For the moment, I've hacked around the problem by coloring 0-meter elevations as water, but the NASA data's coastline clipping doesn't always match OSM's. There's also the problem that OSM uses coastlines for things like the Great Lakes, which aren't at sea level at all, so my current rendering just shows them as large, flat areas of land.

Finally, there's the matter of holes in the NASA elevation data. There are a lot of small to medium gaps, which with my current process, end up at sea level (and thus get colored as water). There are also some really large holes, like this one near the North Carolina/South Carolina border:

When I looked around for information about filling voids in elevation data, I saw a lot of mentions of the CGIAR-CSI dataset, which used a variety of techniques for filling in voids in the SRTM dataset. I'm downloading CGIAR-CSI data for the US now, and I'm going to experiment with using it instead of the raw SRTM data. (More properly, I'm going to try using it in addition to the SRTM data; the CGIAR-CSI data has a resolution of three arc-seconds, while the SRTM1 data has a resolution of 1 arc-second for the US, so I'd like to use the higher-resolution data where possible and only fall back to the lower resolution data as needed.)

Hillshading Plus Relief Coloring

Posted by asciipip on 10 May 2010 in English.

This weekend, I took a break from Cascadenik rules to play with rendering elevation data, mostly working off http://wiki.openstreetmap.org/wiki/Contours and http://wiki.openstreetmap.org/wiki/HikingBikingMaps . I'm keeping notes on my process, so when I've got something good I'll document everything in more detail, but I'm reasonably pleased with my results so far.

I'd previously set up contour lines and figured out how to have either a greyscale hillshaded or color relief underlay for my rendering, so I decided to work on combining the hillshading and color relief images. My rendering target is the US, so I started with the SRTM1 data and interpolated that from 30x30 meter pixels down to 15x15 meter pixels, then I used the PerryGeo utilities to generate separate hillshaded and colored relief images. Following that, I used the GIMP to combine them.

Unfortunately (for the simplicity of my rendering process) I'd like to render areas that span more than one SRTM tile. I say "unfortunately" because I'm still having problems with that aspect of things: I can't use gdal_merge.py to combine the images before processing them with the GIMP, because it makes BigTIFFs, which the GIMP won't read (my libtiff doesn't support them). That means I have to generate the shaded and colored images separately for each SRTM tile, combine them, and then use gdal_merge.py. I got most of the process down (I learned a little GIMP Script-FU to automate the merging process), but in my final merged image, I have thick black lines at the borders of the original SRTM tiles. It looks like the hillshading program is putting a black border around the edge of its generated image. Weirdly, the lines show up when I use the generate_tiles.py script, but not the generate_image.py script (the first image below spans the boundary between the N39W77 and N39W78 tiles). I need more research here.

I also need to work on my relief color schemes. The one I have now looks quite pretty, but is almost the same shade as trunk roads in places. I'd like to get something more muted, so it works well as a background, but still vivid enough that the elevation changes are visible.

Anyway, here are a couple of renderings I've done. I think they look pretty good.

The second one is a zoom level 15, which is the last level where the hillshading looks passable; at higher levels, I fall back to just color relief underlays.

Things I Learned Today About Mapnik

Posted by asciipip on 3 May 2010 in English.

In Mapnik rendering of LineSymbolizers, using a stroke-dasharray is incompatible with a stroke-linecap of "round". If you try to combine the two, Mapnik will ignore the stroke-dasharray and, if you're me, leave you very confused as to what's happening.

Recent diary entries

tiger:reviewed

TIGER 2011

Road Classification

Conclusion