Minutiae about Tile Rendering Queues

Posted by asciiphil on 20 April 2011 in English (English)

I do my own map rendering, primarily to preload my phone's mapping program with offline tiles, but also to have a rendering that highlights things of interest to me for my region. My rendering is based on TopOSM, but with a lot of personal hacks and changes (the more general of which I've contributed back).

Until recently, my update process was very manual: run scripts to render all tiles for my rendering areas (which include overlapping bboxes of different shapes), run another script to combine the tiles for the three TopOSM layers (color-relief, contours, features) into single tiles for my phone, run a third script to use the expire files from my automated minutely mapnik setup to delete any tiles with expired data, run the first script again to rerender any deleted files, run the second script again to recombine any changed files, etc., etc. I wanted something that I could set up and leave running. As far as I could tell, the other render daemons in use (Tirex and mod_tile's renderd) just have a one-to-one correspondence between mapnik stylesheets and rendered tilesets. TopOSM is a little more complex (but oh, so much prettier). So I assembled my own.

The core of my change was externalizing my render queue. Rather than having a python script create a Queue object, fill it with tiles to be rendered, and spinning off render threads to consume the queue, I set up RabbitMQ to manage the work queues. That lets me feed the queues from one process and consume them from completely separate processes. The bit that I'm proud of has been my queue setup, so that's what I'm going to talk about.

When osm2pgsql runs, I have it generate tile expiration files. Those files are simply lines of the form "z/x/y", where "z" is a zoom level specified on the command line and "x" and "y" describe tiles with expired data (more or less; the process of associating data to tiles is a little fuzzy and osm2pgsql tries to err on the side of expiring more that it needs). Those get fed directly into a queue named "expire".

I have a single process consuming that queue. It takes each line, checks to see if I already have tiles rendered at any zoom level that cover that tile, then checks to see if the metatiles that contain those tiles have been submitted for rendering yet. The latter is done via a lookup in a PostgreSQL database. If there are tiles that need to be rendered, it marks the metatiles as submitted in the database and sends a message of the form "{layer1,layer2,layer3}/z/x/y" to one of the render queues. There is a separate render queue for each zoom level. For example, it recently fired off the message "{color-relief,features}/13/2442/3024" to queue render_13.

The rendering queue consumption is a little different. I want to prefer to get work from longer queues, since they have more work to be done, but I also want to prefer higher zoom levels, since the lower the zoom level, the more likely it is to be put back in the queue after rendering. The process I came up with is this: For each queue, multiply the current queue length by the total number of tiles at that queue's zoom level (4z) and then divide by the number of tiles in a metatile for that zoom level, then normalize each number as a percentage of the total. Next, take the number of metatiles I've rendered for each zoom level and normalize those to percentages, too. Finally, pick the zoom level which has the greatest difference between queue percentage and rendered percentage and get the next work unit from that queue. The rendering process is multithreaded; each thread keeps its own count of rendered metatiles.

After rendering is finished, the records of submission in the database for those metatiles are removed and a message of the form "z/x/y" is sent to the "combine" queue (after checking to make sure it isn't already pending). A third process consumes the combine queue and combines the different layers into single tiles.

I have separate functions that I use for adding new tiles to be rendered. They are given a bbox and a zoom level range, check for tiles that are missing from that volume, and submit the appropriate metatiles for rendering. Once the tiles have been rendered for the first time, the expiry process will notice if their data is updated and will then submit them for rerendering. I used those functions to bootstrap my new rendering process, since I'd previously been keeping track of which tiles needed rerendering by simply deleting them.

All of the queues are specified as durable, and all of the messages are sent as persistent, so even if the RabbitMQ node shuts down or my computer reboots I won't lose track of the work that needs to be done.

I've been running this setup for several days now and it seems able to keep up with the data expiration rate. One optimization I'm considering would be to track which individual tiles have been expired and only extract those from the metatiles. That would probably speed up some of the higher zoom levels; a zoom 16 metatile is a 12x12 block of regular tiles, and it takes a noticeable amount of time to extract and write all 144 constituent tiles. Another idea I had was to keep track of how long it takes to render metatiles at each zoom level and factor that into the decision about what zoom level to do next, preferring zoom levels that go faster. I plan to leave this setup running for a week or so to get a baseline before I try any alterations.

Comment from asciiphil on 21 April 2011 at 14:41

"Leave this setup running for a week" was a premature statement, apparently. I got my first really big tile expiration and it wiped out all the rendering I'd done for the past few days.

The rendering process spends most of its time extracting tiles from the metatiles, so I've implemented my old approach of only extracting the changed tiles within my new AMQP setup. The render queues now just get messages of the form 'z/x/y', while the database expiry table has the following fields: zoom, metax, metay, layer, x, y, expired. The expiration process only sends a render message if there are no pending tiles for a metatile, but it will expire individual tiles as it sees them. When a render process gets a render message, it pulls all that metatile's expired tiles from the database, renders the metatile, and then only extracts the changed tiles.

This should put me back into the place I was with my previous setup, where I was able to render all the changed tiles from zoom 11 and up slightly faster than they expired. It'll take a while to get through the backlog of metatiles I already had queued, but the problem shouldn't be getting worse while I'm doing it.

Login to leave a comment