bdiscoe's diary

Recent diary entries

About Huts

Posted by bdiscoe on 20 December 2015 in English (English)

Particularly in Africa, there are huge number of small round buildings. I believe the best way to model them is with a single node, tagged building=hut; optionally with a radius or width. However, the dominant OSM practice is to use a whole way with lots of nodes. So, I must go along with the community. Unfortunately, I encounter a lot of this while working, for example on #MapLesotho


That's 7 huts, 130 nodes, all the wrong size, some overlapping - likely the result of very bad copy and paste. (There are actually 8 huts there; they missed one.)

My fingers have developed the rhythm for cleaning this sort of thing up in JOSM. First set the simplify-way.max-error to something appropriate (0.3), then for each hut:

  1. Click to select buliding.
  2. 'G' to unglue it from the surrounding buildings, if needed.
  3. Drag to center it correctly.
  4. Ctrl-Alt-drag to scale it down to the correct size.
  5. Shift-Y to reduce nodes.
  6. 'O' to re-round the circle.

Within a few seconds I have this:


8 huts, 86 nodes. It would be probably hard to automate this further without computer vision (because of the manual visual alignment), but using the standard JOSM position of left hand on keyboard, right on mouse, it can go very quickly with the above steps. One possible optimization might be to combine Shift-Y and O into a single keypress i can reach with moving my left hand.

Needless to say, if you are setting up a mapathon for people to map Africa, PLEASE teach them how NOT to make the bad big huts like the above, so I don't have to fix so many thousands of them.

The most inefficient way in North America

Posted by bdiscoe on 4 December 2015 in English (English)

As mentioned in my last entry, I wrote a tool using Osmium to parse PBF and look for inefficient ways, i.e. ways that if you ran simplify on them, would drop hundreds of nodes and not change shape. I'd been running it on small countries and US states, but this evening I tried it out on a PBF of all of North America, and here is the prize-winner for the most bloated, wasteful way: a small dirt road between some houses and the coastal wetlands in Nova Scotia, Canada:

Captain's Way

That's 2000 nodes, or one every 5.6 centimeters.

By the time you read this, I'll have cleaned up way 85927697, but I'd also like to offer to anyone else, if you are an experienced editor who focuses on a specific part of the world, if you would like me to run my tool on the extract for your region, I can send you a list of the worst ways and you can clean them up. Let me know!

Also, a word to importers: Please, please make sure you check your data for this kind of mess BEFORE you upload. In this case it was Steve in Halifax importing CanVec in 2010, but similar things are being uploaded all around the world, all the time (I know because my tool finds them!)

Cleaning up NHD in North Carolina

Posted by bdiscoe on 30 November 2015 in English (English)

Some months ago, I was looking around OSM to find where the bulk of noise and inefficiency is. I'm aware of some other efforts (like Toby in 2013) but I actually went so far as to write a C++ app on Osmium which parses PBF extracts, simulates running line simplification, and produces a list of the ways which are the least efficient.

I ran this on various US states and countries worldwide, and the winner is... North Carolina. It is so wildly inefficient that we may as well not bother with the rest of the world until we've cleaned up North Carolina first. (Just for comparison, the size of the output: Finland 17k, Colombia 20k, Colorado 30k, England 45k, North Carolina >300k).

Why is North Carolina (henceforce NC) so obese? There are a handful of bad spots elsewhere (like some of the Corine landuse in Europe, and a waterway import in Cantabria, Spain) but nothing close to NC. It's due almost entirely to a single import in 2009. The USA's hydrography, NHD is a truly massive dataset. An account called "jumbanho" imported NHD for NC and apparently applied almost no cleanup (beside a small pass at removing duplicate nodes a few months later). Among the many flaws of that import:

  1. Topology is mostly missing (features meet but don't share a node)
  2. Really out of date (shows swamps that were drained decades ago, streams running through what are now shopping malls).
  3. Almost all of it is barely or not at all decimated (a stream which is perfectly modeled in 15 nodes is sometimes made of 300 nodes).

As a result, the jumbanho account has noderank #3 with 43 Mnodes (this was rank #2 with 49 Mnodes, but as I'll explain, I've been busy).

This is what the data looks like:


As you can see, a regular set of evenly-spaced nodes, with no decimation. This is worse when you consider that the overall accuracy is far less: here the pond is 7m off, the stream is variously 12, 14, or 32m off:


This inefficiency is bad in a few ways, such as making the planet file balloon with dead weight. But a more relevant issue is this: When a user comes in here to fix the alignment of the data, there is NO WAY they can be expected to move all 200 points by hand. An import with too many points is highly resistant to EVER getting manually fixed. The solution is to simplify first, but by how much? Here we encounter some issues:

  1. The simplify tool in JOSM defaults to 5 (!) meters which is brutal and useless for just about any use I can think of (maybe very, very rough old GPS traces?)
  2. JOSM lets you change the amount, but it is buried deep in the "advanced preferences."
  3. Once you find that, knowing how much to simplify each kind of feature is a matter of experience and skill.

After hundreds of hours of manual work on NC, I have learned what values work; general guidelines which I carefully tweak based on each area:

  1. natural=wetland. These are very rough, 1.0-1.2 m.
  2. waterway=stream, waterway=riverbank, natural=water. They are more delicate, I use 50-80 cm.
  3. Streams and rivers which are either inside wetlands, or "artificial paths", these are often notional and don't correspond closely to any real feature, so >1 m.

Note that these levels of precision are WAY less than the actual inaccuracy of the data; they cannot harm the value in the data, because they are too small. In fact they could be bigger, but the goal is to leave enough nodes so that human editing won't have to add or remove many nodes when they align the feature to its correct location.

While I would be happy to just write a bot to do that first step, that would be a "mechanical edit" and I'd have to put up with mailing list arguments to get permission. (I'd also have to write that bot, which I've been too lazy to do so far). So instead, I've put in the time to do it all manually in JOSM, with steps like:

  1. Study each area, compare the features to the imagery.
  2. Do some super careful simplify with appropriate values. (It gets really tiring having to dig into JOSM's advanced preferences every single time I change the value.)
  3. Fix the topology by carefully tuning the validator's precision and allowing it to auto-fix, with manual verification.
  4. Some manual adding of bridges and culverts.
  5. Removing/updating non-existent wetlands and streams (one common clue: they intersect buildings).
  6. Splitting some ways and creating relations, for example for a large riverbank and wetland that share an edge.

Here is that same area after a simplify to 70cm on the NHD features, then quick manual alignment:


It's exhausting. In fact, a bot wouldn't really help that much, since the simplify is only the first step, the topology and the rest still need to be done by a human anyway.

By my rough calculation, if I work hard for 5 hours every night, It would take around 5 months for me to finish cleaning up NC NHD to a decent level.

On the plus side, other NHD imports I've seen around the USA (like Oklahoma) don't seem to be nearly as bad; while they suffer from most of the same quality issues, at least they were already simplified before uploading.

Thoughts on a better heat map for OSM changes

Posted by bdiscoe on 28 October 2015 in English (English)

I love YOSMHM. Pascal has done a great job with it, and it's very cool. In fact, given my desire to map the entire world, I rely on YOSMHM to tell me where to map next. And just recently, it's improved from updating weekly/monthly to daily! But, there are some limitations.

  1. One blob per changeset. If I map a long highway, a dot at the center of the highway really doesn't show where I mapped.
  2. Missing data. I have around 12,100 changesets, but YOSMHM only shows 9356. That might explain why it doesn't show the mapping I did in southern Chad, or eastern Cameroon, or Agadez in Niger, or many other places.

So, I set out to see if I could make my own heatmap. Here are my first steps.

  1. Thanks to a great answer from EdLoach it was easy to get XML files for all my changesets.
  2. I parsed those XML to get the extents (min_lat min_lon max_lat max_lon) of each changeset.
  3. I tried a number of different web-heat-map tools, and settled on Leaflet + Leaflet.heat because it was super easy to use. I just pass the center of each changeset's extents to Leaflet.heat as a point, and the result looks like this.

Finally, I can see at least some blob in every part of the world I've mapped. Unfortunately, unlike YOSMHM, all the changesets are weighted equally (it would take a lot more querying and parsing to weight them) so that, for example, it's hard to tell that I've done 10x more mapping in Namibia than in Japan.

I can dream of a better heatmap! It would have:

  1. Each added/modified/deleted entity, not just the changeset centers. This would mean, in my case, 9 million dots. A simple approach like Leaflet.heat can't handle that many, because it draws every point, every time, using javascript. If I have to, I could write C++ to make a custom global tileset with thousands of PNGs, but that seems like overkill; maybe there's a webby way? Mapbox maybe, can it handle 9 million points?
  2. It should show added/modified/deleted in different colors (like green/blue/red) so I can quickly see what places I've done more correction, vs. adding new features. India, Africa and Central America are the only places I've added huge amounts of detail to OSM, but you can't tell that by looking at YOSMHM. Is there a better/faster/more polite way to get all that detail without making 12,100 queries to the OSM API? I can't just parse the planet file, because that only has current state, not history.
  3. Not damn Mercator. Anything else would be better. How about Goode Homolosine? Can I get "free" background tiles (like Mapbox's) served in anything beside "web mercator"?

Did somebody delete Hyderabad, India?

Posted by bdiscoe on 13 June 2015 in English (English)

Not the entire city, but the place node, for Hyderabad, a city of over 7 million people... which currently has no label.

See the unlabeled city here:

It seems unlikely that there never was a label, which means that somebody probably deleted it accidentally, or otherwise accidentally changed it in some way which prevents it from appearing (e.g. change "place=city" to "place=City") It is also missing from Nominatim. Is there no bot or other process checking for when something huge like this disappears from the map?

Location: Omkar Nagar, Bairamalguda, Rangareddy, Telangana, 500074, India

Top OSM Rank: The Big Imports

Posted by bdiscoe on 29 May 2015 in English (English)

Here are some of the things I learned while studying the OSM accounts with high HDYC rank, as described in my last entry

  • TIGER! 'DaveHansenTiger' originally imported TIGER, but 'woodpeck-fixbot' (noderank #1) subsequently touched nearly every node. Because TIGER is such a mess, it may be possible to estimate how quickly it is getting cleaned up based on the last-modified count of woodpeck-fixbot. Currently it's 136 M, going down at around 12 K/day, so at this rate it will take 32 years to clean up all the TIGER in the USA.

  • TIGER ways: between 'DaveHansenTiger' and 'bot-mode', there are around 8 M imported TIGER ways that haven't been touched since import. At the current rate of 1800/day, it's going to take 12 years to clean it all.

  • NHD! (USA national hydrographic dataset). A lot of NHD was imported without any decimation at all, resulting in >90% of the nodes being redundant, effectively noise. There are at least 6 accounts involved in NHD import, including 'jumbanho' (noderank #2) and 'nmixter' (noderank #5). I've tried manually cleaning up this NHD mess manually, but it takes several hours to do 100 K nodes in JOSM. At that rate, it would take me 8 months of editing every night to clean up all 46 M nodes.

  • Canada! The CanvecIimports account (noderank #3) is at 45 Mnodes and still rising, and there are several more accounts that appear to import Canvec like azub (noderank #11), bgamberg (noderank #13). Some areas are neatly decimated and tidy, some are not.

  • Netherlands: There are two huge imports, 3dShapes (noderank #4) and BAG, which is spread across 16 accounts which all nicely have BAG in their name (Sander H_BAG, Commodoortje_BAG, etc.) All 16 are in the top 200 of noderank.

  • Massachusetts: The state GIS was a massive import, by account jremillard-massgis (noderank #10) and a few others. Amazingly, the road data is actually of high quality and needs very little cleanup; the wetland hydrography is a bit messier.

  • Some highly ranked accounts appear to be national imports (?) that I found harder to learn about, such as Tom_G3X (noderank #7, 19 Mnodes in Japan) and Petr1868 (noderank #9, who has apparently added 23 Mnodes to the Czech Republic using "Tracer Using RUIAN and LPIS")

  • France has many accounts importing from its national cadastre database, but it is very hard to tell which. One might guess that ËdzëronK (noderank #12) and the 15 other massive contributors to France in the top 100 are importing cadastre, but perhaps some of them are actually just amazing, really active mappers.

In my next post I'll talk about some non-import, real cool mappers I discovered.

Top OSM Rank: Who are these crazy, amazing people?

Posted by bdiscoe on 3 May 2015 in English (English)

It's now been around 2 years since I started editing OSM seriously. I've used Pascal's HDYC and YOSMHM to track my progress, with the goal of making a real contribution to OSM worldwide. One thing I always wondered about, as my OSM node rank went up. It would reach, for example, 300, and I would think, wow, I have been editing so much... who are these 299 people around the world who actually edit even more??

Recently, I set out to answer this question. I started looking at HDYC for well-known accounts, as well as their heatmaps, and gathering the results in a spreadsheet. When that got tedious, I wrote a C++ app on Osmium and ran it on the Planet.osm file, to find out the complete list of top-ranked accounts.

And the answer is... most of them are not actually people; a few are bots, and many are "import accounts", or user accounts that have been used for a large import at some point. (...but not all of them! Some are actual, live humans manually editing OSM longer and more extensively than me). Along the way, I learned some OSM history, and the diverse patterns in OSM in different countries.

Here is a link to the spreadsheet, sortable by rank, with my own notes on the where/what of around 400 accounts, including the top 100 in node and way ranks. The data is approximate... it's not auto-refreshed by a script (yet), so some ranks may be a little out of date.

In my next diary entry I'll share some of the stories and realizations I've had while gathering this data.

The story of the oldest node in OSM.

Posted by bdiscoe on 26 April 2015 in English (English)

I've been using Osmium, and today parsed the entire planet.osm.pbf for the first time. I noticed that the nodes are in order by ID, and the very first node, the oldest node still in existence, is node 10. Let's look at it!

This tough little node has had quite a history! Presuming that the database is accurate, this is what it tells us today:

  • v1, April 18, 2005, user sxpert creates this node in chageset 4. That's right, the fourth changeset ever. We have no record of its geographic location.
  • v2 was redacted.
  • v3, April 2009, super-user woodpeck (Frederik Ramm) places this node in London, near Regent's Park.
  • v4, September 2009, dtr20 deleted the node, as part of "Survey east of Regent's Park"
  • v5, April 2011 max60watt somehow re-uses the node, placing it near the bus stop in a quiet little village near the town of Kassel, in the German state of Hesse.
  • ... and that's where the node has stayed, through 3 small edits.

The name of the village is Furstenwald. As an English speaker, saying this name out loud causes me to giggle. Of all the nodes still alive today, the first in the world is in... Furstenwald.

(Actually) fixing the Peoria GIS import

Posted by bdiscoe on 11 April 2015 in English (English)

It turns out that Peoria is not just a metaphor, but a real place in Illinois. It is also the location of a rather messy GIS import of County data! Here's the history as far as I can determine:

  • The Peoria County Government gathered data resulting in a dataset as of 1997.
  • In 2010, that dataset was considered old enough to be considered "obsolete" which apparently justified uploading it to OSM.
  • A wiki page Peoriagisuploa describes most of the details of what happened in June 2010. Basically, it's woods and buildings.
  • Woods came in with natural=wood (but too many nodes)
  • Buildings came in with building=yes and BUILDING_T=(0..9) for a building type, as documented on the wiki page.
  • In July 2010, user account "xybot" applied some changes called "Correction of faulty peoria bulk upload" which did a very strange thing to the building tags. It changed "BUILDING_T" to "tiger:buildingType" (!) There is no such tag in TIGER (which has no buildings, let alone building types).

I studied this mess and figured out what should have occurred: mapping Peoria's BUILDING_T onto the actual, standard OSM building types:

  • BUILDING_T=1 -> building=residential
  • BUILDING_T=2 -> building=commercial (very few of these are industrial)
  • BUILDING_T=3 -> building=school
  • BUILDING_T=4 -> building=garage
  • BUILDING_T=5 -> building=static_caravan
  • BUILDING_T=6 -> building=industrial (there are almost none of these)
  • BUILDING_T=7 -> building=yes (it was under construction in 1997, it isn't now)
  • BUILDING_T=8 -> (make these the inner ways of multipolygon relations)
  • BUILDING_T=9 -> man_made=pier

I have been laboriously applying these fixes recently, and will finish soon. I'm doing it manually in JOSM, checking carefully, not only because that's the quality thing to do, but also to head off any claims of "mechanical editing". I'm also cleaning up the woods, which is not simply a matter of decimation but also a lot of manual updating because the woods are not where they were in 1997.

Go home coastline data, you are drunk

Posted by bdiscoe on 14 February 2015 in English (English)

I've been having a great time recently using Osmium to write my own analysis code in C++ to look for anomalies in the PBF extracts. Today it found this very strange coastline in South Africa:


Perhaps, i thought, this is some rare geological formation, that makes an amazing wavy line? So let's look at the data over aerial:


Uh.... what? I've seen a lot of weird and bad map data, like the mechanical grit of PGS and all the horrors of TIGER, but this was new. It's as if some cartographer said... "yeah, it's a coastline! Uh, what kind? Uh.... a wavy coastline? Yeah, wavy! Lots of waves.... I LOOOVE to draw WAVES, wheeeee!"

I should mention that this appears to go on for hundreds of kilometers.

The importer of this way is an "Adrian Frith" but it's most certainly not his fault, the source tags says "Municipal Demarcation Board" so it was probably made by some government department, or maybe a contractor that was getting paid by the node?

I'm sorry to say I'll be quickly tidying up this coast, so perhaps by the time you read this, you won't be able to see the waves at, for example, here. On the other hand, coastline changes are special and take a while to process, so the blue ocean wobbles will probably stay for quite a while.

Come work on #MissingMaps with me!

Posted by bdiscoe on 10 December 2014 in English (English)

The recent #MissingMaps project added to the Tasking Manager is a great way to work together on specific places!

However, some of the maps are sadly neglected. The "high priority" HOT places (like for ebola and cyclones) get a lot of contributors. But, other #MissingMaps have little work.

For example, #793 - Missing Maps: Bukavu, Democratic Republic of Congo was added 5 days ago and nobody contributed at all. I have begun, but it's kinda lonely. Come join me! The imagery is good, the infrastructure is easy to see, and the DRC has tons of unmapped detail. Come join the fun and MAP THE PLANET!

Ethiopia, Sudan, Nicaragua...

Posted by bdiscoe on 29 October 2014 in English (English)

Some recent work i'm proud of:

  1. Fixed the tags (and in some cases the boundaries) of all of Ethiopia's national parks, including Gambella, Bale Mountains, Awash, etc. I even added the Alatish National Park which was entirely missing.

  2. Nearby on the Ethiopia/Sudan border, improved the area where they are building the Grand Ethiopian Renaissance Dam on the Blue Nile.

  3. In Ethiopia's Afar province, added the newly-built Tendaho Irrigation Dam with its huge reservoir.

  4. In Sudan, improved the massive Khashm el-Girba Reservoir and nearby city of Al-Qadarif which needed lots of work.

  5. A large number of waterways in the wild eastern parts of Nicaragua (like here) and Honduras (around here), although sadly most of the streams aren't visible until zoom level 13.

  6. Just now, a complex relation for the Las Trampas Regional Wilderness, near San Ramon, CA, USA

Making JOSM faster with javascript keyboard shortcuts

Posted by bdiscoe on 12 March 2014 in English (English)

User interfaces are very much a matter of taste, so with the caveat that this is all really subjective...

In any graphical program, I find that I am most fast and fluid when I have my left hand on the keyboard (e.g. on ASDF) and my right on a mouse. It's best if all the key combinations I need are easily pressed with my left hand. If i have to move my left hand away, or take my right off the mouse, everything slows down.

So, with JOSM. The first thing I do is open the Preferences, under Keyboard Shortcuts and re-map Delete from the Delete key to 'D'. Now, for shortcuts for all the other common tags (highway=service, building=yes...), it's not simple, but it's possible. JOSM lets you map keys to presets, but those presets still open a dialog (extra steps). To program my own shortcuts, I dug into the scripting plugin (Javascript API). It's very nice, well-supported (thank you "Gubear"!) and I've only begun to explore what it can do.

Here is my script (install_custom_menus.js)

To use it, first enable the Scripting plugin in JOSM's plugin preferences. (You'll need the very latest JOSM, 6891 or later, and up-to-date plugins). Now, from the Scripting menu, open the "console", load the js file, and run it. If it works, you will then see 4 new items on your "Edit" menu.

You can now use Preferences: Keyboard Shortcuts to map keys onto them. I use:

  • T : Clear Tiger
  • Shift+T: Turning Circle / Track
  • Shift+S: Service
  • Shift+B: Building

With only basic familiarity with Javascript, you can easily modify the script to add your own commands, and then maps keys to them. You will need to run the script once, each time you restart JOSM, to add the menu items, but the shortcuts are persistent so you only need to set them once.

A word about responsibility. These are just shortcuts for things that JOSM already does, but although you can now do them faster, you still need to focus on quality and standard OSM practice. For example, for cleaning Tiger (in the USA): before I press 'T' to clear the Tiger "reviewed" tag, I visually confirm that the geometry of the road is correct, that the name is good, that cul-de-sacs have been set appropriately (Shift+T), and its good in every way. Only then should one clear that tag.

Happy editing!

The first 30-day challenge: retrospective

Posted by bdiscoe on 12 March 2014 in English (English)

The first Scout-Telenav 30-day OSM Mapping Challenge just ended. Let me share some of the story.

When it was announce February 11, I was excited. At that time I was already an "addicted mapper", and fairly sure of my fast-accurate JOSM editing skills, so I figured I could win it. The challenge was for the USA. I usually trace Bing in remote parts of the world, but I did know of a lot of roads in Hawaii that could be quickly cleaned up, so I figured that would give me a quick start.

Week 1

My Hawaii edits did produce a good number of points, but experienced Canadian mapper ingalls was in the lead! He was cleaning Tiger in Texas at an impressive rate. I was slowly catching up, but he remained ahead.

Week 2

Suddenly, when ingalls and I were both at ~30k points, he stopped mapping. I breathed a sigh of relief and took the lead. I found myself doing too many steps in JOSM while editing, and started wondering if I could set up keyboard shortcuts that would let me go faster...

Week 3

Just when I seemed safely in the lead, a user ada_s appeared in the rankings and rapidly went up to second place. All their edits had the same comment, "Add address information + split way when exiting the city border" That seemed like an odd thing to do, but it sure racked a lot of points. I struggled to find enough time to stay ahead (I do have a full-time job and girlfriend) and ada_s continued to gain. At this point, my exploration of the JOSM scripting engine produced some results - I was able to create a lot of single-key shortcuts (like Shift+S, set highway=service) that let me go faster (more about those scripts in my next diary entry). I was working faster now, but ada_s was still gaining on me.

Week 4

I pulled a couple late nights editing, which put me at 57k points but ada_s was at 50k and picking up speed. After another day where our scores both leapt up, I finally took a look at exactly what ada_s was doing. They were putting "addr" and "in_in" tags ... on highways. Like, every single road and driveway in Lincoln Nebraska was tagged with "addr:city=Lincoln" and "addr:state=NE". This seemed very odd to me (not to mention useless), so I took a look at the page for addr and sure enough, it doesn't say anything about using it on highways (because, why would you?) I sent ada_s a note asking politely why they were adding those addr tags. I also put in a few changesets removing those same tags from a few cities where ada_s had added them (along with other improvements). I then found a particularly messy Tiger region in South Carolina, and dug into it for another late night, my JOSM edits now at great speed. ada_s never responded but they did, suddenly, stop editing. (Maybe they just didn't know that those tags were useless and nonstandard? It could have been innocent.) They were up to 72k by then, but partly due to undoing their odd tags, I was at 108k. I pulled one more late night then stopped myself. My final score was 145k, ada_s at 72k, followed by quality good editors like "rickmastfan67" and "jonesydesign" at 40-50k.

Conclusion: Having a contest to make the most "edits" does risk people going for questionable things that touch a lot of ways. Perhaps 55k of ada_s's points were in that category (and hence 55k of my own score undoing them, so my real score should be around 90k; still in first place but not crazy). However, I'm certain that the contest did inspire a big increase in overall quality editing. I certainly got a lot faster, learned JOSM better (and spent time improving the USA, where I usually wouldn't bother)

They're doing the contest again ("with simplified rules-and more prizes to win") and that seems like a good thing to me. I won't be entering next time (to give you all a chance :-) and I'll be sharing my JOSM extensions in my next post. My main interest is in getting everyone more productive at editing, for the greater good of OSM.

JOSM scripting plugin: be a power user!

Posted by bdiscoe on 3 March 2014 in English (English)

I've now spent a LOT of time using JOSM, and it is one of the best applications i have ever used, of any kind. With left hand on the keyboard, right on the mouse, you can do quality editing with great speed and accuracy. Advice for newbies: Install the "utilsplugin2" right now, then "buildings_tools" for buildings, and "FastDraw" for streams and ponds.

Eventually, though, you find yourself doing a lot of the same steps over again. One thing JOSM does NOT have is a "macro" ability to record and play back commands. It does, however, have a scripting plugin! (Thank you "Gubaer", author of the plugin!) I have just begun to work with its Javascript API, which has decent docs but very few examples. I will give some examples here in my diary of of scripts i've written, in case they are useful!

As a first example, renaming streets. The JOSM validator will warn you about abbreviated English street names ("Main St") but it won't automatically fix them for you. I wrote a script which does that. Just install the scripting plugin, open the scripting console, paste in this script and press "Run".

Note that this not a shining example of great code, just a rough script. As an exercise for the reader, you could extend it to also handle "Blvd" for "Boulevard".

// Look through all data layers, looking for abbreviated street names and
// replace them with the full string, e.g. "Rd" -> "Road".

var util = require("josm/util");
var command = require("josm/command");
var ScriptingConsole = org.openstreetmap.josm.plugins.scripting.ui.console.ScriptingConsole;
var console = ScriptingConsole.instance.scriptLog.logWriter;

for (i = 0; i < josm.layers.length; i++) {
    var layer = josm.layers.get(i);
    if (, 4) != "Data")
    var dataset =;
    var result = dataset.query("type:way");
    var renames = 0;
    console.println("number of ways: " + result.length);
    for (j = 0; j < result.length; j++) {
        var way = result[j];
        var name = way.get("name");
        if (name == null) continue;
        if (name.length() < 4) continue;

        var s = name.slice(-3);
        if (s == " Tr" || s == " rd" || s == " Ct" || s == "Ave" || s == "Cir" || s == " Dr" || s == " Rd" || s == " Ln" || s== " Pl" || s == " St" || s == "Hwy" || s == " Wy") {
          var s2 = name.slice(0, name.length() - 3);
          if (s == " Tr") s2 += " Trail";
          if (s == " rd") s2 += " Road";
          if (s == " Ct") s2 += " Court";
          if (s == "Ave") s2 += "Avenue";
          if (s == "Cir") s2 += "Circle";
          if (s == " Dr") s2 += " Drive";
          if (s == " Rd") s2 += " Road";
          if (s == " Ln") s2 += " Lane";
          if (s == " Pl") s2 += " Place";
          if (s == " St") s2 += " Street";
          if (s == "Hwy") s2 += "Highway";
          if (s == " Wy") s2 += " Way";

          console.println("  rename [" + name + "] to [" + s2 + "]");
          // create and apply a undoable/redoable command
          layer.apply( command.change(dataset.way(, {tags: {name: s2}}) );
    console.println("renames:" + renames);

Lost city in Darfur

Posted by bdiscoe on 10 December 2013 in English (English)

I was mapping in rural Darfur today and discovered an entire city which was completely unknown/unmapped. It did not appear in Google, Bing, OSM or anywhere else, not even as a village dot. It's 90 km SSE of Nyala, Sudan (latlon: 11.28, 25.14, i.e. with an airstrip, two large markets, and large street grid. I've mapped it now, anyone care to find a name for the city?

Auto roads, part 3

Posted by bdiscoe on 6 September 2013 in English (English)

In order to keep my road follower in the middle of the road, I tried switching from an incremental similarity (compare each point to the next) to absolute (compare each point to the starting point). Since the starting point is given in the middle of road, it happily follows the road center, until this happens: jump With incremental similarity, we were largely immune to disruptions along the side of the road, because we came upon them gradually. Now, a large shadow is sufficiently unlike our starting point that it scares the algorithm into swerving away from the shadow and running off the road. (I can sympathize with the algorithm. I did the same thing in a car once :-)

So, it just solves one problem, and exposes another.

I also tried the idea of, each step, taking a cross-section and look for symmetry to find where the "middle" of the road is. It didn't work; the RGB is just too noisy to find a clear center of symmetry.

Perhaps next I will try, like Richard said, a flood-fill. But rather than try to flood-fill the entire road network, just a local fill to find an approximate road extent. That might work, although there are plenty of examples where it definitely won't, like where the road is surrounded by similarly colored pixels: jump Here is an example of a road which my road follower has no trouble with (following, but not staying in the middle of). Attempting to flood-fill it is extremely sensitive to initial point; picking just the right point is OK but any other (or any looser tolerance) will fail in countless ways, including fill way off the road, or just getting one side of it (as in the image above), or filling everything except the middle.

Auto roads, part 2

Posted by bdiscoe on 4 September 2013 in English (English)

By reducing the step size, I can actually get my naive road-follower to do a better-than-expected job of following curves: snap I'm guessing that this is because roads are more self-similar than what surrounds them, so looking for linear self-similarity stays on the road. What it does NOT do, however, is find the middle of the road. Look closely and you'll see that the path drifts over to one edge of the road and stay there, then wanders back again.

This makes sense; road edges have the same linear self-similarity as road centers, so it's just as happy to follow an edge. But, for OSM we don't want edges. How to tell it to stay in the "middle"? Currently each next point compares the image from the previous point. This makes it largely immune to gradual changes (like the road becoming unpaved, or wet, or shadowed, or newer/older pavement, or even aerials taken at different times) but it allows it to drift to one edge or the other. We could compare to the initial (centered) point, which would solve the drift, but that would to fare poorly if the road's coloration changes over its course.

It will also need some criteria for deciding the road has ended. We can't use an absolute similarity value, since it will vary from place to place. Perhaps if we assume that the initial stretch of road is good, then that calibrates our expected similarity; anything that is e.g. 50% less similar than that can be considered "probable end of road".

First attempt at automatic road following

Posted by bdiscoe on 3 September 2013 in English (English)

My naive thought was, many roads are clear and self-similar, how hard could it be to write an algorithm which simply walks along a step at a time, moving in the direction which is most similar to the previous spot in the image?

It turns out the catch is in "similar". There are apparently countless academic papers on how to evaluate when two images are "similar". I naively went ahead and tried a dumb algorithm: the summed difference of the RGB values.

Amazingly, it actually works in a lot of cases. Behold:


The first two points are given, the rest moving downward follow the road based on naive image similarly. Now, it's not hard to find cases where it fails and drifts off the road - in particular it struggles if the road gets a few pixels wider, as many do - but this is just a first test.

Automated road tracing - "Microsoft Road Detect" didn't work for me

Posted by bdiscoe on 25 August 2013 in English (English)

After so many hours manually tracing roads, one naturally begins to wonder if there's some software for automatically detecting them. Google turns up only a research project, the "Microsoft Road Detect" at

There's some discussion among OSM people about whether this would be a good thing or not. I think the point's moot because it doesn't work.

First thing I tried was the JOSM experimental plugin "MagicShop"; it hadn't been touched in 2 years which is a bad sign. Current JOSM refused to accept the jar, not a huge surprise.

I'd consider it worthwhile to fix the plugin if it would give useful results, so I went directly to and gave it some test coordinates: a nice clear straight section of road in India I happened to be tracing recently.

And this is what it did: bad road

Yeah. Well, maybe I could write my own algorithm/plugin.

Older Entries | Newer Entries