OpenStreetMap

bdiscoe's diary

Recent diary entries

Top OSM Rank: The Big Imports

Posted by bdiscoe on 29 May 2015 in English (English)

Here are some of the things I learned while studying the OSM accounts with high HDYC rank, as described in my last entry

  • TIGER! 'DaveHansenTiger' originally imported TIGER, but 'woodpeck-fixbot' (noderank #1) subsequently touched nearly every node. Because TIGER is such a mess, it may be possible to estimate how quickly it is getting cleaned up based on the last-modified count of woodpeck-fixbot. Currently it's 136 M, going down at around 12 K/day, so at this rate it will take 32 years to clean up all the TIGER in the USA.

  • TIGER ways: between 'DaveHansenTiger' and 'bot-mode', there are around 8 M imported TIGER ways that haven't been touched since import. At the current rate of 1800/day, it's going to take 12 years to clean it all.

  • NHD! (USA national hydrographic dataset). A lot of NHD was imported without any decimation at all, resulting in >90% of the nodes being redundant, effectively noise. There are at least 6 accounts involved in NHD import, including 'jumbanho' (noderank #2) and 'nmixter' (noderank #5). I've tried manually cleaning up this NHD mess manually, but it takes several hours to do 100 K nodes in JOSM. At that rate, it would take me 8 months of editing every night to clean up all 46 M nodes.

  • Canada! The CanvecIimports account (noderank #3) is at 45 Mnodes and still rising, and there are several more accounts that appear to import Canvec like azub (noderank #11), bgamberg (noderank #13). Some areas are neatly decimated and tidy, some are not.

  • Netherlands: There are two huge imports, 3dShapes (noderank #4) and BAG, which is spread across 16 accounts which all nicely have BAG in their name (Sander H_BAG, Commodoortje_BAG, etc.) All 16 are in the top 200 of noderank.

  • Massachusetts: The state GIS was a massive import, by account jremillard-massgis (noderank #10) and a few others. Amazingly, the road data is actually of high quality and needs very little cleanup; the wetland hydrography is a bit messier.

  • Some highly ranked accounts appear to be national imports (?) that I found harder to learn about, such as Tom_G3X (noderank #7, 19 Mnodes in Japan) and Petr1868 (noderank #9, who has apparently added 23 Mnodes to the Czech Republic using "Tracer Using RUIAN and LPIS")

  • France has many accounts importing from its national cadastre database, but it is very hard to tell which. One might guess that ËdzëronK (noderank #12) and the 15 other massive contributors to France in the top 100 are importing cadastre, but perhaps some of them are actually just amazing, really active mappers.

In my next post I'll talk about some non-import, real cool mappers I discovered.

Top OSM Rank: Who are these crazy, amazing people?

Posted by bdiscoe on 3 May 2015 in English (English)

It's now been around 2 years since I started editing OSM seriously. I've used Pascal's HDYC and YOSMHM to track my progress, with the goal of making a real contribution to OSM worldwide. One thing I always wondered about, as my OSM node rank went up. It would reach, for example, 300, and I would think, wow, I have been editing so much... who are these 299 people around the world who actually edit even more??

Recently, I set out to answer this question. I started looking at HDYC for well-known accounts, as well as their heatmaps, and gathering the results in a spreadsheet. When that got tedious, I wrote a C++ app on Osmium and ran it on the Planet.osm file, to find out the complete list of top-ranked accounts.

And the answer is... most of them are not actually people; a few are bots, and many are "import accounts", or user accounts that have been used for a large import at some point. (...but not all of them! Some are actual, live humans manually editing OSM longer and more extensively than me). Along the way, I learned some OSM history, and the diverse patterns in OSM in different countries.

Here is a link to the spreadsheet, sortable by rank, with my own notes on the where/what of around 400 accounts, including the top 100 in node and way ranks. The data is approximate... it's not auto-refreshed by a script (yet), so some ranks may be a little out of date.

In my next diary entry I'll share some of the stories and realizations I've had while gathering this data.

The story of the oldest node in OSM.

Posted by bdiscoe on 26 April 2015 in English (English)

I've been using Osmium, and today parsed the entire planet.osm.pbf for the first time. I noticed that the nodes are in order by ID, and the very first node, the oldest node still in existence, is node 10. Let's look at it!

http://www.openstreetmap.org/node/10/history

This tough little node has had quite a history! Presuming that the database is accurate, this is what it tells us today:

  • v1, April 18, 2005, user sxpert creates this node in chageset 4. That's right, the fourth changeset ever. We have no record of its geographic location.
  • v2 was redacted.
  • v3, April 2009, super-user woodpeck (Frederik Ramm) places this node in London, near Regent's Park.
  • v4, September 2009, dtr20 deleted the node, as part of "Survey east of Regent's Park"
  • v5, April 2011 max60watt somehow re-uses the node, placing it near the bus stop in a quiet little village near the town of Kassel, in the German state of Hesse.
  • ... and that's where the node has stayed, through 3 small edits.

The name of the village is Furstenwald. As an English speaker, saying this name out loud causes me to giggle. Of all the nodes still alive today, the first in the world is in... Furstenwald.

(Actually) fixing the Peoria GIS import

Posted by bdiscoe on 11 April 2015 in English (English)

It turns out that Peoria is not just a metaphor, but a real place in Illinois. It is also the location of a rather messy GIS import of County data! Here's the history as far as I can determine:

  • The Peoria County Government gathered data resulting in a dataset as of 1997.
  • In 2010, that dataset was considered old enough to be considered "obsolete" which apparently justified uploading it to OSM.
  • A wiki page Peoriagisuploa describes most of the details of what happened in June 2010. Basically, it's woods and buildings.
  • Woods came in with natural=wood (but too many nodes)
  • Buildings came in with building=yes and BUILDING_T=(0..9) for a building type, as documented on the wiki page.
  • In July 2010, user account "xybot" applied some changes called "Correction of faulty peoria bulk upload" which did a very strange thing to the building tags. It changed "BUILDING_T" to "tiger:buildingType" (!) There is no such tag in TIGER (which has no buildings, let alone building types).

I studied this mess and figured out what should have occurred: mapping Peoria's BUILDING_T onto the actual, standard OSM building types:

  • BUILDING_T=1 -> building=residential
  • BUILDING_T=2 -> building=commercial (very few of these are industrial)
  • BUILDING_T=3 -> building=school
  • BUILDING_T=4 -> building=garage
  • BUILDING_T=5 -> building=static_caravan
  • BUILDING_T=6 -> building=industrial (there are almost none of these)
  • BUILDING_T=7 -> building=yes (it was under construction in 1997, it isn't now)
  • BUILDING_T=8 -> (make these the inner ways of multipolygon relations)
  • BUILDING_T=9 -> man_made=pier

I have been laboriously applying these fixes recently, and will finish soon. I'm doing it manually in JOSM, checking carefully, not only because that's the quality thing to do, but also to head off any claims of "mechanical editing". I'm also cleaning up the woods, which is not simply a matter of decimation but also a lot of manual updating because the woods are not where they were in 1997.

Go home coastline data, you are drunk

Posted by bdiscoe on 14 February 2015 in English (English)

I've been having a great time recently using Osmium to write my own analysis code in C++ to look for anomalies in the PBF extracts. Today it found this very strange coastline in South Africa:

Wavy

Perhaps, i thought, this is some rare geological formation, that makes an amazing wavy line? So let's look at the data over aerial:

Wavy!

Uh.... what? I've seen a lot of weird and bad map data, like the mechanical grit of PGS and all the horrors of TIGER, but this was new. It's as if some cartographer said... "yeah, it's a coastline! Uh, what kind? Uh.... a wavy coastline? Yeah, wavy! Lots of waves.... I LOOOVE to draw WAVES, wheeeee!"

I should mention that this appears to go on for hundreds of kilometers.

The importer of this way is an "Adrian Frith" but it's most certainly not his fault, the source tags says "Municipal Demarcation Board" so it was probably made by some government department, or maybe a contractor that was getting paid by the node?

I'm sorry to say I'll be quickly tidying up this coast, so perhaps by the time you read this, you won't be able to see the waves at, for example, here. On the other hand, coastline changes are special and take a while to process, so the blue ocean wobbles will probably stay for quite a while.

Come work on #MissingMaps with me!

Posted by bdiscoe on 10 December 2014 in English (English)

The recent #MissingMaps project added to the Tasking Manager is a great way to work together on specific places!

However, some of the maps are sadly neglected. The "high priority" HOT places (like for ebola and cyclones) get a lot of contributors. But, other #MissingMaps have little work.

For example, #793 - Missing Maps: Bukavu, Democratic Republic of Congo was added 5 days ago and nobody contributed at all. I have begun, but it's kinda lonely. Come join me! The imagery is good, the infrastructure is easy to see, and the DRC has tons of unmapped detail. Come join the fun and MAP THE PLANET!

Ethiopia, Sudan, Nicaragua...

Posted by bdiscoe on 29 October 2014 in English (English)

Some recent work i'm proud of:

  1. Fixed the tags (and in some cases the boundaries) of all of Ethiopia's national parks, including Gambella, Bale Mountains, Awash, etc. I even added the Alatish National Park which was entirely missing.

  2. Nearby on the Ethiopia/Sudan border, improved the area where they are building the Grand Ethiopian Renaissance Dam on the Blue Nile.

  3. In Ethiopia's Afar province, added the newly-built Tendaho Irrigation Dam with its huge reservoir.

  4. In Sudan, improved the massive Khashm el-Girba Reservoir and nearby city of Al-Qadarif which needed lots of work.

  5. A large number of waterways in the wild eastern parts of Nicaragua (like here) and Honduras (around here), although sadly most of the streams aren't visible until zoom level 13.

  6. Just now, a complex relation for the Las Trampas Regional Wilderness, near San Ramon, CA, USA

Making JOSM faster with javascript keyboard shortcuts

Posted by bdiscoe on 12 March 2014 in English (English)

User interfaces are very much a matter of taste, so with the caveat that this is all really subjective...

In any graphical program, I find that I am most fast and fluid when I have my left hand on the keyboard (e.g. on ASDF) and my right on a mouse. It's best if all the key combinations I need are easily pressed with my left hand. If i have to move my left hand away, or take my right off the mouse, everything slows down.

So, with JOSM. The first thing I do is open the Preferences, under Keyboard Shortcuts and re-map Delete from the Delete key to 'D'. Now, for shortcuts for all the other common tags (highway=service, building=yes...), it's not simple, but it's possible. JOSM lets you map keys to presets, but those presets still open a dialog (extra steps). To program my own shortcuts, I dug into the scripting plugin (Javascript API). It's very nice, well-supported (thank you "Gubear"!) and I've only begun to explore what it can do.

Here is my script (install_custom_menus.js)

To use it, first enable the Scripting plugin in JOSM's plugin preferences. (You'll need the very latest JOSM, 6891 or later, and up-to-date plugins). Now, from the Scripting menu, open the "console", load the js file, and run it. If it works, you will then see 4 new items on your "Edit" menu.

You can now use Preferences: Keyboard Shortcuts to map keys onto them. I use:

  • T : Clear Tiger
  • Shift+T: Turning Circle / Track
  • Shift+S: Service
  • Shift+B: Building

With only basic familiarity with Javascript, you can easily modify the script to add your own commands, and then maps keys to them. You will need to run the script once, each time you restart JOSM, to add the menu items, but the shortcuts are persistent so you only need to set them once.

A word about responsibility. These are just shortcuts for things that JOSM already does, but although you can now do them faster, you still need to focus on quality and standard OSM practice. For example, for cleaning Tiger (in the USA): before I press 'T' to clear the Tiger "reviewed" tag, I visually confirm that the geometry of the road is correct, that the name is good, that cul-de-sacs have been set appropriately (Shift+T), and its good in every way. Only then should one clear that tag.

Happy editing!

The first 30-day challenge: retrospective

Posted by bdiscoe on 12 March 2014 in English (English)

The first Scout-Telenav 30-day OSM Mapping Challenge just ended. Let me share some of the story.

When it was announce February 11, I was excited. At that time I was already an "addicted mapper", and fairly sure of my fast-accurate JOSM editing skills, so I figured I could win it. The challenge was for the USA. I usually trace Bing in remote parts of the world, but I did know of a lot of roads in Hawaii that could be quickly cleaned up, so I figured that would give me a quick start.

Week 1

My Hawaii edits did produce a good number of points, but experienced Canadian mapper ingalls was in the lead! He was cleaning Tiger in Texas at an impressive rate. I was slowly catching up, but he remained ahead.

Week 2

Suddenly, when ingalls and I were both at ~30k points, he stopped mapping. I breathed a sigh of relief and took the lead. I found myself doing too many steps in JOSM while editing, and started wondering if I could set up keyboard shortcuts that would let me go faster...

Week 3

Just when I seemed safely in the lead, a user ada_s appeared in the rankings and rapidly went up to second place. All their edits had the same comment, "Add address information + split way when exiting the city border" That seemed like an odd thing to do, but it sure racked a lot of points. I struggled to find enough time to stay ahead (I do have a full-time job and girlfriend) and ada_s continued to gain. At this point, my exploration of the JOSM scripting engine produced some results - I was able to create a lot of single-key shortcuts (like Shift+S, set highway=service) that let me go faster (more about those scripts in my next diary entry). I was working faster now, but ada_s was still gaining on me.

Week 4

I pulled a couple late nights editing, which put me at 57k points but ada_s was at 50k and picking up speed. After another day where our scores both leapt up, I finally took a look at exactly what ada_s was doing. They were putting "addr" and "in_in" tags ... on highways. Like, every single road and driveway in Lincoln Nebraska was tagged with "addr:city=Lincoln" and "addr:state=NE". This seemed very odd to me (not to mention useless), so I took a look at the page for addr and sure enough, it doesn't say anything about using it on highways (because, why would you?) I sent ada_s a note asking politely why they were adding those addr tags. I also put in a few changesets removing those same tags from a few cities where ada_s had added them (along with other improvements). I then found a particularly messy Tiger region in South Carolina, and dug into it for another late night, my JOSM edits now at great speed. ada_s never responded but they did, suddenly, stop editing. (Maybe they just didn't know that those tags were useless and nonstandard? It could have been innocent.) They were up to 72k by then, but partly due to undoing their odd tags, I was at 108k. I pulled one more late night then stopped myself. My final score was 145k, ada_s at 72k, followed by quality good editors like "rickmastfan67" and "jonesydesign" at 40-50k.

Conclusion: Having a contest to make the most "edits" does risk people going for questionable things that touch a lot of ways. Perhaps 55k of ada_s's points were in that category (and hence 55k of my own score undoing them, so my real score should be around 90k; still in first place but not crazy). However, I'm certain that the contest did inspire a big increase in overall quality editing. I certainly got a lot faster, learned JOSM better (and spent time improving the USA, where I usually wouldn't bother)

They're doing the contest again ("with simplified rules-and more prizes to win") and that seems like a good thing to me. I won't be entering next time (to give you all a chance :-) and I'll be sharing my JOSM extensions in my next post. My main interest is in getting everyone more productive at editing, for the greater good of OSM.

JOSM scripting plugin: be a power user!

Posted by bdiscoe on 3 March 2014 in English (English)

I've now spent a LOT of time using JOSM, and it is one of the best applications i have ever used, of any kind. With left hand on the keyboard, right on the mouse, you can do quality editing with great speed and accuracy. Advice for newbies: Install the "utilsplugin2" right now, then "buildings_tools" for buildings, and "FastDraw" for streams and ponds.

Eventually, though, you find yourself doing a lot of the same steps over again. One thing JOSM does NOT have is a "macro" ability to record and play back commands. It does, however, have a scripting plugin! (Thank you "Gubaer", author of the plugin!) I have just begun to work with its Javascript API, which has decent docs but very few examples. I will give some examples here in my diary of of scripts i've written, in case they are useful!

As a first example, renaming streets. The JOSM validator will warn you about abbreviated English street names ("Main St") but it won't automatically fix them for you. I wrote a script which does that. Just install the scripting plugin, open the scripting console, paste in this script and press "Run".

Note that this not a shining example of great code, just a rough script. As an exercise for the reader, you could extend it to also handle "Blvd" for "Boulevard".

//
// Look through all data layers, looking for abbreviated street names and
// replace them with the full string, e.g. "Rd" -> "Road".
//

var util = require("josm/util");
var command = require("josm/command");
var ScriptingConsole = org.openstreetmap.josm.plugins.scripting.ui.console.ScriptingConsole;
var console = ScriptingConsole.instance.scriptLog.logWriter;

for (i = 0; i < josm.layers.length; i++) {
    var layer = josm.layers.get(i);
    if (layer.name.substring(0, 4) != "Data")
      continue;
    var dataset = layer.data;
    var result = dataset.query("type:way");
    var renames = 0;
    console.println("number of ways: " + result.length);
    for (j = 0; j < result.length; j++) {
        var way = result[j];
        var name = way.get("name");
        if (name == null) continue;
        if (name.length() < 4) continue;

        var s = name.slice(-3);
        if (s == " Tr" || s == " rd" || s == " Ct" || s == "Ave" || s == "Cir" || s == " Dr" || s == " Rd" || s == " Ln" || s== " Pl" || s == " St" || s == "Hwy" || s == " Wy") {
          var s2 = name.slice(0, name.length() - 3);
          if (s == " Tr") s2 += " Trail";
          if (s == " rd") s2 += " Road";
          if (s == " Ct") s2 += " Court";
          if (s == "Ave") s2 += "Avenue";
          if (s == "Cir") s2 += "Circle";
          if (s == " Dr") s2 += " Drive";
          if (s == " Rd") s2 += " Road";
          if (s == " Ln") s2 += " Lane";
          if (s == " Pl") s2 += " Place";
          if (s == " St") s2 += " Street";
          if (s == "Hwy") s2 += "Highway";
          if (s == " Wy") s2 += " Way";

          console.println("  rename [" + name + "] to [" + s2 + "]");
          // create and apply a undoable/redoable command
          layer.apply( command.change(dataset.way(way.id), {tags: {name: s2}}) );
          renames++;
          way.setModified(true);
        }
    }
    console.println("renames:" + renames);
}

Lost city in Darfur

Posted by bdiscoe on 10 December 2013 in English (English)

I was mapping in rural Darfur today and discovered an entire city which was completely unknown/unmapped. It did not appear in Google, Bing, OSM or anywhere else, not even as a village dot. It's 90 km SSE of Nyala, Sudan (latlon: 11.28, 25.14, i.e. http://www.openstreetmap.org/#map=14/11.28/25.14) with an airstrip, two large markets, and large street grid. I've mapped it now, anyone care to find a name for the city?

Auto roads, part 3

Posted by bdiscoe on 6 September 2013 in English (English)

In order to keep my road follower in the middle of the road, I tried switching from an incremental similarity (compare each point to the next) to absolute (compare each point to the starting point). Since the starting point is given in the middle of road, it happily follows the road center, until this happens: jump With incremental similarity, we were largely immune to disruptions along the side of the road, because we came upon them gradually. Now, a large shadow is sufficiently unlike our starting point that it scares the algorithm into swerving away from the shadow and running off the road. (I can sympathize with the algorithm. I did the same thing in a car once :-)

So, it just solves one problem, and exposes another.

I also tried the idea of, each step, taking a cross-section and look for symmetry to find where the "middle" of the road is. It didn't work; the RGB is just too noisy to find a clear center of symmetry.

Perhaps next I will try, like Richard said, a flood-fill. But rather than try to flood-fill the entire road network, just a local fill to find an approximate road extent. That might work, although there are plenty of examples where it definitely won't, like where the road is surrounded by similarly colored pixels: jump Here is an example of a road which my road follower has no trouble with (following, but not staying in the middle of). Attempting to flood-fill it is extremely sensitive to initial point; picking just the right point is OK but any other (or any looser tolerance) will fail in countless ways, including fill way off the road, or just getting one side of it (as in the image above), or filling everything except the middle.

Auto roads, part 2

Posted by bdiscoe on 4 September 2013 in English (English)

By reducing the step size, I can actually get my naive road-follower to do a better-than-expected job of following curves: snap I'm guessing that this is because roads are more self-similar than what surrounds them, so looking for linear self-similarity stays on the road. What it does NOT do, however, is find the middle of the road. Look closely and you'll see that the path drifts over to one edge of the road and stay there, then wanders back again.

This makes sense; road edges have the same linear self-similarity as road centers, so it's just as happy to follow an edge. But, for OSM we don't want edges. How to tell it to stay in the "middle"? Currently each next point compares the image from the previous point. This makes it largely immune to gradual changes (like the road becoming unpaved, or wet, or shadowed, or newer/older pavement, or even aerials taken at different times) but it allows it to drift to one edge or the other. We could compare to the initial (centered) point, which would solve the drift, but that would to fare poorly if the road's coloration changes over its course.

It will also need some criteria for deciding the road has ended. We can't use an absolute similarity value, since it will vary from place to place. Perhaps if we assume that the initial stretch of road is good, then that calibrates our expected similarity; anything that is e.g. 50% less similar than that can be considered "probable end of road".

First attempt at automatic road following

Posted by bdiscoe on 3 September 2013 in English (English)

My naive thought was, many roads are clear and self-similar, how hard could it be to write an algorithm which simply walks along a step at a time, moving in the direction which is most similar to the previous spot in the image?

It turns out the catch is in "similar". There are apparently countless academic papers on how to evaluate when two images are "similar". I naively went ahead and tried a dumb algorithm: the summed difference of the RGB values.

Amazingly, it actually works in a lot of cases. Behold:

following

The first two points are given, the rest moving downward follow the road based on naive image similarly. Now, it's not hard to find cases where it fails and drifts off the road - in particular it struggles if the road gets a few pixels wider, as many do - but this is just a first test.

Automated road tracing - "Microsoft Road Detect" didn't work for me

Posted by bdiscoe on 25 August 2013 in English (English)

After so many hours manually tracing roads, one naturally begins to wonder if there's some software for automatically detecting them. Google turns up only a research project, the "Microsoft Road Detect" at http://magicshop.cloudapp.net/

There's some discussion among OSM people about whether this would be a good thing or not. I think the point's moot because it doesn't work.

First thing I tried was the JOSM experimental plugin "MagicShop"; it hadn't been touched in 2 years which is a bad sign. Current JOSM refused to accept the jar, not a huge surprise.

I'd consider it worthwhile to fix the plugin if it would give useful results, so I went directly to magicshop.cloudapp.net and gave it some test coordinates: a nice clear straight section of road in India I happened to be tracing recently.

And this is what it did: bad road

Yeah. Well, maybe I could write my own algorithm/plugin.

Two accounts

Posted by bdiscoe on 23 August 2013 in English (English)

It seems I have two OSM accounts, one with >800 edits, one with 70 edits: http://www.openstreetmap.org/user/bdiscoe http://www.openstreetmap.org/user/Ben%20Discoe But regardless of which one use to log into OSM.org, it takes me to the second URL. Baffling. If only there was a way to merge them...

Older Entries | Newer Entries