AndrewBuck's Diary

Recent diary entries

The OSM dataset is a rich database of information about the world. Not only can the data be used to render beautiful and useful maps, it can also be used to do some nifty calculations in conjunction with other open datasets that exist. One such potential use is in cleaning up and improving elevation datasets such as the public domain SRTM or ASTER datasets. Both of these datasets offer elevation coverage over much of the world but they are limited in resolution (SRTM is 90 m per pixel, ASTER is 30 m per pixel).

Although these elevation datasets can be useful for very basic things like drawing 25 m contour lines on topographic maps, if you try to use these elevation models for more sophisticated things like drainage calculations, you will get very weird results due to a number of issues. First, almost no rivers are 30 m wide or more, so the river channel can’t be modeled at all. Second, noise in the elevation datasets often results in river valleys with “bumps” in the river that make it back up and flood large areas that in the real world would really flood because there is a stream draining the water out of the valley. Lastly, in many areas streams, drainage ditches, and other such things often run in close proximity to buildings and other such features. In these areas trying to fix the elevation to correctly predict drainage ends up changing the elevation of buildings as the pixels are so large the whole neighboorhood must be raised or lowered to make the drainage calculations work. The example shown below shows a couple of these issues, and other areas suffer from them in much more signifigant ways.

The original ASTER data at 30 m per pixel.

The original ASTER data at 30 m per pixel.

In order to fix the problems outlined above, and allow for other useful operations it is therefore worthwhile to upsample the horizontal resolution of these elevation datasets. The GRASS GIS system has tools like r.resamp and r.proj which can do things like cubic interpolation which provide nice smooth datasets at higher resolutions (in my example I am using 3 m per pixel, a 10 fold step up from the ASTER data I started with).

The data resampled to 3 m per pixel using r.proj with cubic interpolation.

The data resampled to 3 m per pixel using r.proj with cubic interpolation.

This resampled data looks much more visually pleasing and if things like contour lines were computed from it they would be smoother, etc, but it still does not necessarily accurately reflect the drainage network topology for flood simulations. This is where the the OSM database comes in.

Because we map streams in OSM and because they are often traced from aerial imagery with resolution on the order of .1 meters per pixel, even very small streams and canals can be recorded with their exact position and shape accurately recorded. Furthermore, we can measure things like the width or depth of the waterway to further improve our data about the feature. This high precision stream path can then be “carved” into the higher resolution elevation model to allow the stream to cut through the small “bumps” intorduced by the cubic resampling. In the example below a 5 pixel wide (i.e. 15 meters wide) channel is cut into the dem everywhere there is a river or stream in OSM. The stream is first “leveled” by iterating over all the pixels in the stream and for each one setting it to the minimum of its own value or the 8 pixels surrounding it (the 3x3 neighborhood). During this leveling step if the pixel is on the stream centerline is is carved down 5 meters below the minimum neigborhood value, if it is along the edge of the stream it is only carved 2 meters below the minimum. Finally, after the carving is done the whole map is run over with a 5x5 neighborhood average to smooth out the river channel into something roughly U shaped to approximate a channel carved out by water, but still ensuring that the center of the channel will be free from major bumps that impede the flow of water in the flooding calculations. The result can be seen below, notice that not only can we now see the irrigation lines in the fields in the center of the image, we can also begin to see the rough cut in erosion pattern in the mountainous regions as well.

The final 3 m elevation model with the stream network from OSM carved into the surface.

The final 3 m elevation model with the stream network from OSM carved into the surface.

The final result is quite nice, both visually and topographically. This is just a first step in this process to demonstrate the basic idea. I plan to do much more in terms of varying the width and depth profiles of the waterways to more accurately model the drainage system and I also want to look at integrating other sources of elevation data from the OSM database such as natural=water which shows a level contour line in the elevation profile at high resultion, natural=cliff and barrier=retaining_wall which indicate “jumps” in the elvelvation model, and also things like road and railway embankments, etc. The 3 m per pixel chosen is just small enough to nicely represent the width of a built up road or rail line or to represent roads “sunk” down a few inches from the surrounding grade due to things like curbs, etc. I hope to hear suggestions on other things from OSM I could use, or how the OSM data itself could be improved to better allow for these kinds of analyses.

EDIT: As requested in the comments below, here is the ‘difference’ calculated with r.mapcalc between the upsampled DEM and the carved DEM. The values in the “flats” i.e. anywhere that is not a river range from about -1 to 1 m and the river are as deep as 7 meters in the places where they cut through a “bump” caused by the cubic resampling process.

The “difference” between the 3 m DEM and the carved DEM.

The "difference" between the 3 m DEM and the carved DEM.


Location: Laguna Verde, Valparaíso, Provincia de Valparaíso, Valparaiso Region, Chile

In order to better understand the villages in west africa fighting the ebola outbreak currently occurring there, it would be good to have better population data for the towns. One way to do this is to measure the number of buildings per square mile in a typical vilage and then just map a “perimeter” residential landuse area around a village and then work out how many buildings there should be based on the size of the area.

I have been experimenting with this a bit using the data mapped in OSM. As of the time I made this dataset we had 43 small villages in this area that had all of their buildings mapped, as well as a footprint to measure the area. I calculated an average building density of 0.00418 buildings per square meter. Then using this figure I pretended I didn’t know the number of buildings in the town and then re-computed how many there should likely be based on the area. The results are in the graph below, and look quite good.

Actual number vs predicted number of buildings

If the method worked perfectly they should fall exactly on a line with the predicted number being exactly equal to the actual number. Of course in reality the prediction is not perfect but it does look pretty good.

Location: Kissidougou, Faranah Region, Guinea

The HOT team is currently working on mapping the country of Mali in response to the conflict that has recently broken out there. One of the big problems we have is that the country is very large, but very sparsely populated. Because of this we need help finding out where to map, i.e. where are all the people.

One of the HOT team members, PovAddict, has put together a nifty little tool for helping us with this problem. At the link below you will be shown a random place in the area we are interested in. If you see a building you click on it, otherwise you click the button to go on to the next one.

Help us out by doing some tiles yourself, and please pass this link around via twitter, facebook, etc. It is a great way for your non-mapper friends to help out the project without having to learn how to map in detail.

Location: Mopti Cercle, Mopti, Mali

Over the last few days I have been putting together a virtual machine image of an Ubuntu 12.04 install set up to act as an OSM tileserver and Nominatim server. The Virtualbox virtual machine image boots up and after logging in and running the ‘startx’ command to start the graphical environment, has a “configuration wizard” script on the desktop.

Running this wizard will prompt you to download an OSM PBF extract to the desktop. This PBF file will then be loaded into either a tile rendering database, or a Nominatim database, or both. The configuration is entirely point and click and took about 30 minutes to have a fully functional tileserver and Nominatim server with up to date coastlines downloaded automatically for the Washinton, DC, extract (about a 10 mb PBF extract, processed on a typical dual core PC). I wrote a wiki page about how I set up the virtual machine starting from the bare Ubuntu 12.04 64 bit desktop build to the fully functional server.

The 2.2 gb virtual machine image can be downloaded via BitTorrent using the following torrent file or magnet link. You will find a Readme file in the torrent telling you how to import and boot the image in Virtualbox and further instructions in the VM image itself.


I hope people find this VM fun and useful to play around with. Hopefully now when people want a quick testserver for an app they are developing, or people who want to scrape tiles, or do bulk geocoding can easily download and run this VM on their own hardware, and avoid placing undue load on the OSM community’s donated servers.

Please share any comments or feedback about the VM either in the comments here, or on the talk page on the wiki.


Some new imagery for the Panama Canal

Posted by AndrewBuck on 2 September 2012 in English (English). Last updated on 3 September 2012.

I was looking through the ISS Imagery and found quite a bit of imagery covering Panama City and the Panama Canal area. This area is almost always cloudy, so the only way to get complete imagery coverage is to combine inputs from many different images taken at different times. All of these images will have clouds over some portion of them, but since the clouds aren’t always in the same place so 2 or 3 sets of imagery covers the area pretty well.

I have put together the first 4 images of one pass by the ISS over the area, there are about 10 more images from that pass that still need to be rectified, as well as 20 or so from other passes to make the cloud free coverage near 100%. Currently between the Bing imagery and the imagery already rectified about 80% of the city is covered. For details on how to load the imagery see the Panama Wiki Page:

You can see the imagery layer on an online slippy map on the mapwarper site:


Location: La Cresta, Bella Vista, Distrito Panamá, Panamá, 080807, Panama

Mapping Trees in Gulu, Uganda

Posted by AndrewBuck on 21 August 2012 in English (English). Last updated on 22 August 2012.

Last week the Red Cross put up some imagery of two cities in northern Uganda. The towns are being armchair mapped by tracing buildings and trees over the city and Red Cross volunteers will be on the ground in a few weeks to collect information to fill in the map details.

One of the goals of the mapping project is to measure the fire risk within the city. Gulu has thousands of small circular huts with grass roofs, many of which are densely clustered together. Furthermore, there are quite a few trees throughout the city and in a fire these could burn as well.

The mapping of the city is progressing rapidly, roads are mostly done and the buildings are nearing completion as well. Tree coverage is almost 100% in the areas that are mapped, but only about 30% of the city’s area has been covered for trees yet.

I have been working out an efficient workflow for mapping the rest of these trees (and any other ones people want to map) efficiently and accurately. I do the work in two passes. First, I just go through an area and put down untagged nodes on all the trees in the area (going methodically to make sure all are done in one pass). Holding down shift in add mode lets you just do a single click for each tree, very easy to do lots of them quickly like this. Then I click the upload button in josm and cancel the upload so the validator lets me select all the untagged tree nodes and add tags to them (this can be done with all of them in a single operation by double clicking on the group name to select them all). After all the newly created trees have been properly tagged I upload them to the server so they are visible to other mappers so they won’t duplicate them in their own mapping efforts.

The second pass is the more interesting one. After all the trees in a given area have been marked, I make a second pass over the area to measure the crown diameter of all the trees. The crown diameter is the side to side width of the leafy area at the top of the tree, i.e. it is how big of a circle it looks like on the ground when you look straight down at it. Knowing the crown diameter of a tree tells you how much fuel would be available in a fire, as well as giving you a rough estimate on the tree’s height and age as well. Therefore making this single measurement for each tree in the dataset makes the tree data far more useful for planning and mapping purposes. It also makes it useful to hide small trees (for example ones with a diameter of less than 8 meters at the lower zooms), or to draw them as circles on the map scaled to their proper size.

There are currently about 10,000 trees mapped in Gulu, and almost exactly half of these have had their diameters measured using the following process (I estimate there are approximately 40,000 trees in the city in total). I start by loading up the josm paint style in the picture below and setting the “inactive” objects color to bright green in the preferences panel (hotkey F12). Then I load in the trees queried from the overpass API, as well as barrier=hedge, landuse=forest, and natural=wood as a layer called ‘Trees.osm’. Then I create a new layer called ‘Tree diameters.osm’ and when I switch to this layer, the other layer becomes inactive and so all the trees without diameters show up as solid bright green dots that are easy to see, and trees whose diameter have already been entered show up as open circles of the appropriate diameter. Then to measure the diameters I draw a single, 2 node way across each unmeasured tree, putting these untagged ways in the ‘Tree diameters.osm’ layer. You can easily check that you covered all the trees in an area by doing a select all which will draw all your measurement lines in red and these should exactly cover the green dots for unmeasured trees in the area of interest.

This measuring goes pretty fast. I can typically measure about 1,000 trees per hour using this technique. After I have an area covered by these measuring lines, I save the file containing the lines and run a perl script on it that measures the length of every line in the dataset and writes a new osm file containing a single node at the midpoint of every line in the original osm file, with the node positioned at the midpoint of the line, with a tag containing the measured length in the diameter_crown=* tag. This new file is called ‘Tree measured.osm’ and a new instance of josm is launched with 2 layers, the ‘Trees.osm’ layer downloaded from overpass, and the ‘Tree measured.osm’ layer containing nodes with the measured diameters tagged on them.

The final step is using the conflation plugin in josm to merge the tag from the measured layer, onto the node for the tree it corresponds to. The conflation plugin is configured using ‘Trees.osm as the reference layer, and ‘Trees measured.osm’ as the subject layer. I just leave the max distance at the default of 20 meters as it doesn’t really matter since the resulting diameter nodes end up being within about 2 meters of the appropriate tree anyway (much closer than the distance to other trees in the dataset). The conflation plugin will match them up perfectly, due to the excellent agreement in position of the nodes and goes very fast; the only annoyance is at having to click the ‘Apply’ button on the tag conflict window for each of the thousand or so trees I am entering at a time. With the clicking for each tree this part of the process takes about 5 minutes, but would take less than a minute for any number of trees if the conflation plugin would just silently merge the tags when there are no real conflicts (i.e. just adding a new tag).

Once the conflation is finished the ‘Tree measured.osm’ layer which now has the newly measured diameters tagged onto the original nodes in the OSM DB with the position and source information is uploaded back to OSM. After this is completed I wait a few minutes and re-download the ‘Trees.osm’ file from overpass to get a single file with everything nicely merged together. :)

Below is a screenshot of the JOSM paint style I use for this work (trees with diameters are shown as green circles, trees without diameters are filled yellow circles).

JOSM Paint Style for Tree Diameters

Finally, what do you do when you are finished making a geodatabase with the crown diameter of 5,000 trees over a city, that’s right… science! I wrote a little script using grep and sed to cut out the diameter data from the osm file I get from the overpass query and then wrote a gnuplot script to fit a curve to the data and plot the results. The plot below is the size distribution of the measured tree diameters (in meters). The points are the number of trees in each 0.1 meter bin (the data is rounded to .1 meters as more than this is unnecessary). The green line is a bezier smoothed version of the raw distribution data, and the thicker blue curve is a lognormal curve fit to the raw bin counts. The data seems to be very clean and the curve ends up fitting rather nicely. I haven’t calculated the mean of the distribution but it would be interesting to see what it (and the other relevant statistics end up being). Parameters for the lognormal fit are as follows:

  • lognormal(x, mu, sigma, scale) = (scale/(xsigmasqrt(23.14159))) * exp(-1.0((log(x)-mu)2/(2*sigma2)));

  • Final set of parameters Asymptotic Standard Error
  • ======================= ==========================
  • mu = 1.95629 +/- 0.008175 (0.4179%)
  • sigma = 0.3576 +/- 0.007727 (2.161%)
  • scale = 317.125 +/- 6.171 (1.946%)
  • correlation matrix of the fit parameters:
  • mu sigma scale
  • mu 1.000
  • sigma 0.335 1.000
  • scale 0.386 0.591 1.000

Tree Crown Diameter Size Distribution


Location: Green Valley, Gulu, Northern Region, 167, Uganda

I came across this article on the BBC news this morning and thought the OSM community would be interested in it. It discusses the way the Kabul post office delivers letters to people, given that most of the streets are unnamed and the houses are not numbered.

The "addresses" instead tend to be the recipient's name, along with some rough directions for how to find them, like "near the such and such mosque", or "by the local high school". Given that this is how deliveries are done in this area (and probably many other areas like it), I thought it would be good to show this to everyone so they can have some insight into what kinds of things might be useful to map for people in these areas. It seems like POI's and landuse may actually be more useful than street names for finding your way around, since many streets are unnamed or only informally named.


I was looking at the wiki and came across the page which lists many existing datasets as well as links to people who have imagery we may be able to make use of. One of these sources is the images taken by astronauts aboard the International Space Station. The images are taken with an off the shelf camera and a very long lens (800 mm focal length). The resulting ground images can be quite detailed and are often of intersting places, many of which are not covered by Bing or Yahoo.

I thought I would look up Libya (as it is a current HOT project) expecting to find one or two images but was pleasantly surprised to find dozens of them. I have uploaded to MapWarper two of the images covering Misrata during the day (an area where our map is very bad due to no hi-res imagery) and plan to also upload one of the Misrata at night images which shows a larger area, as well as one for Surt.

Surt is a particularly interesting case as we have a very poor map of it (again due to lack of imagery), but it is also the hometown of Muammar Gaddafi making it one of the strongholds for the current government. I will try to get the Surt and Mirata at night photos up on mapwarper later today. If anyone wants to upload them before that to start working on them earlier, please respond to this diary entry so I know not to upload a second copy.

Below are the links to the two images already on mapwarper. The description is all run together (mapwarper doesn't display the newlines in the description) but they contain the original NASA url and the source tag you should use for tracing (which is really long unfortunately).

If you follow the links above to the mapwarper images there is an 'Export' tab which has the links you need to use the imagery in JOSM or Potlach.

Also, as is common practice I have entered the borders of the imagery into the OSM database so when people open the area in JOSM, or what have you, they know there is a source of imagery there that is available. I have tagged the two ways forming the boundaries with the source tag you should use for roads traced from each one. (Note that the source tags are identical except for the image number at the end.)

I will be doing a lot of tracing in these areas over the next few days and if anyone wants to join in that would be cool. The last time I called for help on Libya I got a very good response and we mapped an area with a lot more streets than this in just a few days so soon our map will likely be as good as or better than anyone else's in these areas.


Location: Almagaasbah, Misrata, Bani Walid, 378252, Libya

A thank you regarding Libya

I would just like to take a moment to say thank you to those who responded to my diary entry asking for help working in Libya. When I made the post, the area between Zlitan and Al Khoms was almost totally unmapped, now it is hard to find unmapped roads in the imagery.

Thank you to all those who helped out with the mapping effort, it would have taken me many weeks to do what was done by others in just the first few days after I made my post. I will continue to fill in details in this area (and hope others will continue as well), but I think at this point we can say we have a pretty decent map, and one which for this area seems to be one of the best around. OpenStreetMap, and the HOT team, continue to amaze and inspire me -- here's hoping both projects continue even after we have mapped the whole world to this level of detail.


P.S. Here is a link to the area on OSM as well as a comparison with Google so you can see how well it is now mapped.

I finally got around to setting up my own tileserver with the idea of playing around with the mapnik stylesheet to try to resolve some of the issues it has. I have noticed that there are a lot of issue (~300) currently outstanding against the mapnik sheet and only about 3 names in the "assigned to" fields of these tickets, so I have decided to try working through some of them on my local tile server here so I can submit patches to the trac tickets allowing them to be resolved.

In doing this however I see that there are many "duplicate" rules in the stylesheet. There are several ways in which the duplication occurs and some of it makes sense (progressively more specific rules about a certain road type for example), but some of these are redundant, or even outright copies in the filter string with different styles specified for the exact same set of roads, meaning that it can be very difficult to find out which rule even applies to a given way.

I will try to put some patches together to remove some of the more egregious duplicate entries and submit them shortly via trac tickets. If anyone has any insight into why there seems to be so much duplication I would appreciate your insight as I am new to mapnik and have much to learn. Feel free to either respond to this diary entry, or get in touch with me on Skype (username andrewbuck40) if you want to text/voice chat about this, or any other mapnik style related issues.


Libya has been designated as a HOT project area for a while now due to the on-going hostilities there. There is hi-resolution imagery on Bing for the Tripoli area which is already pretty well mapped, however there is also imagery for the cities of Al Khums and Zlitan which has a lot of un-mapped area in it.

The conflict has been at a stalemate for a while, however just in the last few days the rebels have begun to advance again. The BBC reports that the rebels have started pushing westward from Misrata toward Zlitan, and they are also advancing northward from Gharian toward this same area.

Whether the rebels are able to use OSM in their activities or not, we know that various aid organizations do monitor it for coordinating their efforts. It would be nice if we had the area completely mapped as best we can from the existing imagery by the time these aid organizations move into the area.

Here are the stories from the BBC referenced above, as well as the bounding box of the hi-res imagery for the area so you can see where to work. Tomorrow I will try to put together another diary post detailing specific things that I think would be useful to map, as well as techniques for spotting them in the imagery.

Libya: Rebels continue to push west from Misrata

Libyan rebels advance towards key town south of Tripoli

The Bing Hi-res Bounding Box