bdiscoe's diary

Recent diary entries

How to track and encourage contribution?

Posted by bdiscoe on 15 February 2018 in English (English)

As I've been mapping heavily since 2013, I've tried various ways to track my progress. It's really great to feel that your mapping is making a visual and statistical difference! However, as of today, there is no good metric, and it's very frustrating. Understandably, there is no way at all to measure actual quality or value of contribution across users, because it's subjective, and users are very different from each other. However, for me, I know that I map at a consistent quality and node density, so I should at least be able to measure my progress with respect to myself! Here are things I've tried:

  1. Looking at my Heat Map. At first I thought, this is great! A clear visual indication of how much of the world I've contributed to, and a clear goal to cover the world! But, by 2014, I observed many edits weren't counted, confirmed in an email exchange with it's author, Pascal. To its credit, YOSMHM's goal is only to give a rough idea of where a user has edited, and it does that very well. However, once you've edited for a while, you can add a thousand nodes and see nothing on the map change. That's just frustrating.

  2. Looking at the "last modified" nodes and way on HDYC. This was great, I could do a busy night of editing and the next day, HDYC would show my "last modified" nodes went up by, for example, 10K nodes. It gave a good indication of how much I was contributing. Sadly, Pascal changed the website around November 2017 so it no longer shows "last modified". That's frustrating.

  3. Refreshing the map after doing major edits. This used to be possible by right-clicking the main map's tile and submitting it to the "dirty queue", so it would be re-rendered and you can see all your work. This gave quick visual feedback and confirmation and an encouraging sense of accomplishment! Sadly, changed how their map works (some blame OpenLayers?) so now you can't get tile URLs and can't request re-rendering, just wait for several days or more for the server to eventually re-render. That's very sad.

  4. Looking at the #MissingMaps leaderboard you could see your "Total Edits", "Buildings", "Km of Road" numbers, and watch them go up in nearly real time! It used to be a little flaky (ignoring the occasional changeset, which was frustrating), but since December 2017, it now fails to count most changesets. I can submit, for example, 10 #MissingMaps changesets in a day and only see 1 of them counted, the rest ignored. That's beyond frustrating.

In regards to #2 (HDYC), I did write a small C++ program (on my bitbucket) using Osmium to read the OSM Planet file and count the last-modified for each user. But, this is awkward, for several reasons:

  1. The planet file is huge (39 GB!), and I have yet to automate, so it's a long slow manual download.

  2. The planet file is only updated once a week now. I have no idea how HDYC updates every day!

  3. I haven't written the code yet to format the results into a nice, sortable web layout, or to integrate it with my existing spreadsheet of user notes.

So, that's the state of things. All the ways that used to exist, to get visual or numerical feedback or progress metrics, are gone. In my opinion, it's not just a personal frustration, but a lost opportunity for OSM as a movement, that we are missing this simple way for mappers to get encouragement and acknowledgement.

OSM user tracking, import accounts, and account gender

Posted by bdiscoe on 30 September 2016 in English (English)

Emmor (account Palolo) asked some questions in an OSM message, I've put my response here as it may be of general interest.

On 2016-09-26 22:32:47 UTC Palolo wrote:

Ben, Thanks for your contributions to OSM, especially for the rivers you have cleaned up on the west coast. I also just came across your spreadsheet tracking users and find it fascinating.

I was wondering if you could convert your notes into 3 categories of mappers: 1) Imports, 2) Mappers, 3) Combination mapper/importer ?

That's a good question, I've considered trying it, but it can be difficult to tell them apart, or it requires individual detective work that I just haven't gotten around to. Generally, for the import or part-import accounts, I've put the title of that import in the "Grouping" column.

If they have added millions of nodes, and there is nothing about importing in the "Where, What" or "Grouping", it means I haven't been able to figure out if they are an import account or not. For example the Japanese accounts, "Tom_G3X" and "ikiya" and "yamasan". They are probably imports(?)

I've also put the account name in bold (like katpatuka and Heinz_V) if they have contributed millions of features without any obvious importing. Anyone who belongs in this category that I've missed, please let me know!

Also have you thought about gender classification?

I've thought about it, but it is also very hard to tell. Very few account names/images are clearly gendered, and nearly all those that are, by name or image, appear male.

It appears to me that there are very few female top contributors. I wonder why this is since it is open for anyone to edit.

Probably for the same reasons that cartography and technology in general is so male, cultural bias encourages it for men and encourages other things for women.

The top-ranked female account that I know of is "ediyes", a Mapbox mapper at #137/88 (more than 88K changesets!). However, it's entirely possible that some of the mysterious accounts in the top 100 are female. In the top 1000, there are many, including other female mapboxers (dannykath, karitotp, samely...) and the Queen of #MapLesotho, tshedy.

One editor that has been super active over the past 9 months is "Aiko Nakata", which is a Japanese female-only name. Also "Febrina Dewi" and "Fatisya Ilani Yusuf" and "asti_shinoda", all women I believe, were the top-ranked contributors to #MissingMaps last year, all from Indonesia, an amazing amount of mapping work.

A Rant: The Way Beyond Craftmapping That Nobody Is Talking About

Posted by bdiscoe on 25 September 2016 in English (English)

When I read Michal Migurski's recent post robots, crisis, and craft mappers, I was really baffled and concerned. I am a fan of Migurski; he's a good person and a smart guy. But the content of this particular blog post was really off. I had hoped it would pass with little notice, but I can tell from the #craftmapper T-shirts at SOTM that people actually paid attention, so sadly I feel compelled now to rebut, and hopefully offer some useful perspective as well.

To get something out of the way first, I am absolutely a "armchair" or "craft" mapper, and an addicted mapper, averaging ~5 hours a day mapping for the past 3.5 years; by my own estimation, there are only two human OSM accounts (katpatuka and Heinz_V) with more node/way contribution. (Also, shoutouts to AndrewBuck, Stalker61 and ulilu!) I care passionately about the map, I've been in geo since the 90s, and I've been inside Google to see how mapping actually happens at scale.

My OSM Heat Map

To start with, he writes:

The OpenStreetMap community is at a crossroads

Arguably, no it isn't. It is actually on a stable trajectory, with no major shifts likely.

I see three different movements within OpenStreetMap: mapping by robots, intensive crisis mapping in remote areas, and local craft mapping where technologists live

Actually, no. "Robot" mapping is a perennial project of AI zealots, not a movement, and cannot and will not produce acceptable data (for reasons way beyond the scope of this rant). At best, it is another way to produce yet more controversial imports of dubious quality. Crisis mapping is now well-established for many years, not a new or dynamic trend; same with local or remote "craft" mapping, i.e. normal OSM contributors; not a movement, and not new.

The first two represent an exciting future for OSM, while the third could doom it to irrelevance.

This is saying that normal OSM contributors, the ones that have and continue to build most of the map - and the great majority of the quality map - are "irrelevant". This is really, 100% wrong.

Historically, OpenStreetMap activity took place in and around the home areas of OSM project members

True enough, and that is still the single largest source of quality map contributions. The other parts are imports, a small amount of commercially-sponsored input, and armchair mappers like myself, tracing aerials from the places that can't (or can't yet) map themselves, either for HOT or MissingMaps or beyond. Together, that IS OSM, past and present, and unless Something Dramatic happens, that is also OSM's future.

Craft mapping remains the heart of the project, potentially due to a passive Foundation board who’ve let outdated behaviors go unexamined.

I am trying to figure out how to not feel hurt by this. "OUTDATED." The passion that drives the entire past, present and future of OSM is "outdated?"

Left to the craft wing, OSM will slide into weekend irrelevance within 5-10 years.

That's basically saying that OSM is irrelevant today. As an opinion, that's a pretty harsh one.

Two Modest Proposals (1) codes of conduct and other mechanisms intended to welcome new participants from under-represented communities

This sounds fine, but it seems orthogonal to the "robot, crisis, craft" framing. It seems uncontroversial to empower and support more crisis/craft mappers from under-represented communities.

(2) the license needs to be publicly and visibly explained and defended for the benefit of large-scale and robot participants

I have sat out the license wars, partly because, as a regular non-lawyer human, I cannot fathom what all the fuss is about. That said, it also seems unrelated to crisis/craft mappers, with or without AI-robot assistance to produce data for human review, who will surely be able to proceed with or without license changes.

I could say much more about this, but much has already been hashed out of the comment thread on the original blog. For example, "automation vs. craft is a strawman argument; Both - in an integrated manner!" yes obviously.

Instead, I'd like to provide an answer the question I believe Migurski is actually asking. I believe he is saying:

  1. While better in some areas, OSM isn't on par, for the full range of uses, with maps from Google/Apple/etc.
  2. The existing approaches aren't on a trajectory to get us there, therefore they "doom us to irrelevance".
  3. We need something more to get us there, but what is it (robots? codes of conduct? license changes?)

The answer to this question is obvious, but everyone seems to be waffling and dodging it. I will say it: MONEY.


To be a top-tier global map, it takes roomfuls of full-time, paid mappers, with the kind of resources and coordination that (realistically) are only found in large corporations.

  1. Clickshops. Google has them, Apple has them, any organization that wants to take OSM to the "next level" will need them. In some developing nation (for cost), with fast computers and fast networks and thorough, regularized training for speed and consistency. (In case someone is thinking Mapbox, that's nice, but think bigger. Think 100x.)

  2. Streetview. Every station in Google's clickshops has the entire catalog of streetview instantly available, continuously integrated into the mapping flow. Without a streetview-like dataset, you just can't do it. I know Mapillary (+JOSM plugin) is trying, but they are not even close - you have to capture FULL 360 (cylindrical) imagery, not just hope that hobbyists were pointing their camera where you need to to look, and you need the RESOLUTION to read street names. Not even 1% of mapillary users are capturing HD 360 imagery. You can't do it with prosumer cameras (I've tried). You need an expensive rig. Stop pretending otherwise.

Some company or consortium (or, in theory, government, but I'm not holding my breath) could step forward with MONEY and take OSM to that "level III/IV" Migurski (and many others) would like to see. Barring that, everyone needs to extend love to the homebrew/crisis/craft/mapathon mappers we have, because we ARE OSM's future.

Relations for Big Rivers

Posted by bdiscoe on 7 August 2016 in English (English)

My first big river, in 2014, was the Klamath. At first, I tried looking for it using the OSM search box (Nominatim). All I found was a mess of missing river parts, and when I looked closer, I found poorly imported NHD, very old and wrong riverbanks, incorrect tagging, etc. I spend a few days to fix it up and produce the Klamath River waterway relation:


Since then, I've done similar work for other waterways. Sometimes the relation exists but is incomplete, other times I create it; either way, it can take days of work to finish. Here are some in the USA:

Recent work in Illinois:

Outside the USA:

#MapLesotho National Parks

Posted by bdiscoe on 8 May 2016 in English (English)

I've been contributing heavily to the #MapLesotho project for a while, and we're making great progress on all the basic geometry of the country, like roads, paths, buildings, waterways. A good OSM map of a place has more than that, it has things like POI and amenities, which are hard for an armchair mapper like me to help with. One thing I can do, however, is protected areas. Lesotho is a small country with only a few protected areas, two small "national parks" and some even smaller places. There was no vector source for any of them (neither open nor closed), but by doing a bunch of online research and detective work, and reading wikipedia, I was able to add the two largest:

Sehlabathebe National Park Ts'ehlanyane National Park

"Urban India" mappers?

Posted by bdiscoe on 8 May 2016 in English (English)

While watching the OSM ranks and daily activity of the top thousand accounts, I've noticed a curious pattern: a number of accounts which all edit in the same places, the very largest cities in India (New Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata, Chennai...) Their edits are so similar that the accounts must be connected, as if they are all working for a company, school, or other organization. However, their account pages are blank (unlike, for example, Mapbox accounts, which plainly state their affiliation).

Here are the top "Urban India" accounts: saikumar, premkumar, jasvinderkaur, anthony1, Apreethi, sdivya, shekarn, himalay, Navaneetha, praveeng, harisha, himabindhu. I know of at least 20, but I think there are many more. Many (most?) of the accounts have been active since sometime in 2015.

I am very happy that someone is coordinating work on India's cities, but I would be very curious to know who it is! After all, companies like Google have whole buildings full of workers (in India) to update their proprietary maps. Is someone doing the same for OSM? I'd like to thank them.

Here is a picture of premkumar's edits; all the accounts look very similar to this:

heat map

About Huts

Posted by bdiscoe on 20 December 2015 in English (English)

Particularly in Africa, there are huge number of small round buildings. I believe the best way to model them is with a single node, tagged building=hut; optionally with a radius or width. However, the dominant OSM practice is to use a whole way with lots of nodes. So, I must go along with the community. Unfortunately, I encounter a lot of this while working, for example on #MapLesotho


That's 7 huts, 130 nodes, all the wrong size, some overlapping - likely the result of very bad copy and paste. (There are actually 8 huts there; they missed one.)

My fingers have developed the rhythm for cleaning this sort of thing up in JOSM. First set the simplify-way.max-error to something appropriate (0.3), then for each hut:

  1. Click to select buliding.
  2. 'G' to unglue it from the surrounding buildings, if needed.
  3. Drag to center it correctly.
  4. Ctrl-Alt-drag to scale it down to the correct size.
  5. Shift-Y to reduce nodes.
  6. 'O' to re-round the circle.

Within a few seconds I have this:


8 huts, 86 nodes. It would be probably hard to automate this further without computer vision (because of the manual visual alignment), but using the standard JOSM position of left hand on keyboard, right on mouse, it can go very quickly with the above steps. One possible optimization might be to combine Shift-Y and O into a single keypress i can reach with moving my left hand.

Needless to say, if you are setting up a mapathon for people to map Africa, PLEASE teach them how NOT to make the bad big huts like the above, so I don't have to fix so many thousands of them.

The most inefficient way in North America

Posted by bdiscoe on 4 December 2015 in English (English)

As mentioned in my last entry, I wrote a tool using Osmium to parse PBF and look for inefficient ways, i.e. ways that if you ran simplify on them, would drop hundreds of nodes and not change shape. I'd been running it on small countries and US states, but this evening I tried it out on a PBF of all of North America, and here is the prize-winner for the most bloated, wasteful way: a small dirt road between some houses and the coastal wetlands in Nova Scotia, Canada:

Captain's Way

That's 2000 nodes, or one every 5.6 centimeters.

By the time you read this, I'll have cleaned up way 85927697, but I'd also like to offer to anyone else, if you are an experienced editor who focuses on a specific part of the world, if you would like me to run my tool on the extract for your region, I can send you a list of the worst ways and you can clean them up. Let me know!

Also, a word to importers: Please, please make sure you check your data for this kind of mess BEFORE you upload. In this case it was Steve in Halifax importing CanVec in 2010, but similar things are being uploaded all around the world, all the time (I know because my tool finds them!)

Cleaning up NHD in North Carolina

Posted by bdiscoe on 30 November 2015 in English (English)

Some months ago, I was looking around OSM to find where the bulk of noise and inefficiency is. I'm aware of some other efforts (like Toby in 2013) but I actually went so far as to write a C++ app on Osmium which parses PBF extracts, simulates running line simplification, and produces a list of the ways which are the least efficient.

I ran this on various US states and countries worldwide, and the winner is... North Carolina. It is so wildly inefficient that we may as well not bother with the rest of the world until we've cleaned up North Carolina first. (Just for comparison, the size of the output: Finland 17k, Colombia 20k, Colorado 30k, England 45k, North Carolina >300k).

Why is North Carolina (henceforce NC) so obese? There are a handful of bad spots elsewhere (like some of the Corine landuse in Europe, and a waterway import in Cantabria, Spain) but nothing close to NC. It's due almost entirely to a single import in 2009. The USA's hydrography, NHD is a truly massive dataset. An account called "jumbanho" imported NHD for NC and apparently applied almost no cleanup (beside a small pass at removing duplicate nodes a few months later). Among the many flaws of that import:

  1. Topology is mostly missing (features meet but don't share a node)
  2. Really out of date (shows swamps that were drained decades ago, streams running through what are now shopping malls).
  3. Almost all of it is barely or not at all decimated (a stream which is perfectly modeled in 15 nodes is sometimes made of 300 nodes).

As a result, the jumbanho account has noderank #3 with 43 Mnodes (this was rank #2 with 49 Mnodes, but as I'll explain, I've been busy).

This is what the data looks like:


As you can see, a regular set of evenly-spaced nodes, with no decimation. This is worse when you consider that the overall accuracy is far less: here the pond is 7m off, the stream is variously 12, 14, or 32m off:


This inefficiency is bad in a few ways, such as making the planet file balloon with dead weight. But a more relevant issue is this: When a user comes in here to fix the alignment of the data, there is NO WAY they can be expected to move all 200 points by hand. An import with too many points is highly resistant to EVER getting manually fixed. The solution is to simplify first, but by how much? Here we encounter some issues:

  1. The simplify tool in JOSM defaults to 5 (!) meters which is brutal and useless for just about any use I can think of (maybe very, very rough old GPS traces?)
  2. JOSM lets you change the amount, but it is buried deep in the "advanced preferences."
  3. Once you find that, knowing how much to simplify each kind of feature is a matter of experience and skill.

After hundreds of hours of manual work on NC, I have learned what values work; general guidelines which I carefully tweak based on each area:

  1. natural=wetland. These are very rough, 1.0-1.2 m.
  2. waterway=stream, waterway=riverbank, natural=water. They are more delicate, I use 50-80 cm.
  3. Streams and rivers which are either inside wetlands, or "artificial paths", these are often notional and don't correspond closely to any real feature, so >1 m.

Note that these levels of precision are WAY less than the actual inaccuracy of the data; they cannot harm the value in the data, because they are too small. In fact they could be bigger, but the goal is to leave enough nodes so that human editing won't have to add or remove many nodes when they align the feature to its correct location.

While I would be happy to just write a bot to do that first step, that would be a "mechanical edit" and I'd have to put up with mailing list arguments to get permission. (I'd also have to write that bot, which I've been too lazy to do so far). So instead, I've put in the time to do it all manually in JOSM, with steps like:

  1. Study each area, compare the features to the imagery.
  2. Do some super careful simplify with appropriate values. (It gets really tiring having to dig into JOSM's advanced preferences every single time I change the value.)
  3. Fix the topology by carefully tuning the validator's precision and allowing it to auto-fix, with manual verification.
  4. Some manual adding of bridges and culverts.
  5. Removing/updating non-existent wetlands and streams (one common clue: they intersect buildings).
  6. Splitting some ways and creating relations, for example for a large riverbank and wetland that share an edge.

Here is that same area after a simplify to 70cm on the NHD features, then quick manual alignment:


It's exhausting. In fact, a bot wouldn't really help that much, since the simplify is only the first step, the topology and the rest still need to be done by a human anyway.

By my rough calculation, if I work hard for 5 hours every night, It would take around 5 months for me to finish cleaning up NC NHD to a decent level.

On the plus side, other NHD imports I've seen around the USA (like Oklahoma) don't seem to be nearly as bad; while they suffer from most of the same quality issues, at least they were already simplified before uploading.

Thoughts on a better heat map for OSM changes

Posted by bdiscoe on 28 October 2015 in English (English)

I love YOSMHM. Pascal has done a great job with it, and it's very cool. In fact, given my desire to map the entire world, I rely on YOSMHM to tell me where to map next. And just recently, it's improved from updating weekly/monthly to daily! But, there are some limitations.

  1. One blob per changeset. If I map a long highway, a dot at the center of the highway really doesn't show where I mapped.
  2. Missing data. I have around 12,100 changesets, but YOSMHM only shows 9356. That might explain why it doesn't show the mapping I did in southern Chad, or eastern Cameroon, or Agadez in Niger, or many other places.

So, I set out to see if I could make my own heatmap. Here are my first steps.

  1. Thanks to a great answer from EdLoach it was easy to get XML files for all my changesets.
  2. I parsed those XML to get the extents (min_lat min_lon max_lat max_lon) of each changeset.
  3. I tried a number of different web-heat-map tools, and settled on Leaflet + Leaflet.heat because it was super easy to use. I just pass the center of each changeset's extents to Leaflet.heat as a point, and the result looks like this.

Finally, I can see at least some blob in every part of the world I've mapped. Unfortunately, unlike YOSMHM, all the changesets are weighted equally (it would take a lot more querying and parsing to weight them) so that, for example, it's hard to tell that I've done 10x more mapping in Namibia than in Japan.

I can dream of a better heatmap! It would have:

  1. Each added/modified/deleted entity, not just the changeset centers. This would mean, in my case, 9 million dots. A simple approach like Leaflet.heat can't handle that many, because it draws every point, every time, using javascript. If I have to, I could write C++ to make a custom global tileset with thousands of PNGs, but that seems like overkill; maybe there's a webby way? Mapbox maybe, can it handle 9 million points?
  2. It should show added/modified/deleted in different colors (like green/blue/red) so I can quickly see what places I've done more correction, vs. adding new features. India, Africa and Central America are the only places I've added huge amounts of detail to OSM, but you can't tell that by looking at YOSMHM. Is there a better/faster/more polite way to get all that detail without making 12,100 queries to the OSM API? I can't just parse the planet file, because that only has current state, not history.
  3. Not damn Mercator. Anything else would be better. How about Goode Homolosine? Can I get "free" background tiles (like Mapbox's) served in anything beside "web mercator"?

Did somebody delete Hyderabad, India?

Posted by bdiscoe on 13 June 2015 in English (English)

Not the entire city, but the place node, for Hyderabad, a city of over 7 million people... which currently has no label.

See the unlabeled city here:

It seems unlikely that there never was a label, which means that somebody probably deleted it accidentally, or otherwise accidentally changed it in some way which prevents it from appearing (e.g. change "place=city" to "place=City") It is also missing from Nominatim. Is there no bot or other process checking for when something huge like this disappears from the map?

Location: Hastinapuram, Vanasthalipuram, Rangareddy, Telangana, 500035, India

Top OSM Rank: The Big Imports

Posted by bdiscoe on 29 May 2015 in English (English)

Here are some of the things I learned while studying the OSM accounts with high HDYC rank, as described in my last entry

  • TIGER! 'DaveHansenTiger' originally imported TIGER, but 'woodpeck-fixbot' (noderank #1) subsequently touched nearly every node. Because TIGER is such a mess, it may be possible to estimate how quickly it is getting cleaned up based on the last-modified count of woodpeck-fixbot. Currently it's 136 M, going down at around 12 K/day, so at this rate it will take 32 years to clean up all the TIGER in the USA.

  • TIGER ways: between 'DaveHansenTiger' and 'bot-mode', there are around 8 M imported TIGER ways that haven't been touched since import. At the current rate of 1800/day, it's going to take 12 years to clean it all.

  • NHD! (USA national hydrographic dataset). A lot of NHD was imported without any decimation at all, resulting in >90% of the nodes being redundant, effectively noise. There are at least 6 accounts involved in NHD import, including 'jumbanho' (noderank #2) and 'nmixter' (noderank #5). I've tried manually cleaning up this NHD mess manually, but it takes several hours to do 100 K nodes in JOSM. At that rate, it would take me 8 months of editing every night to clean up all 46 M nodes.

  • Canada! The CanvecIimports account (noderank #3) is at 45 Mnodes and still rising, and there are several more accounts that appear to import Canvec like azub (noderank #11), bgamberg (noderank #13). Some areas are neatly decimated and tidy, some are not.

  • Netherlands: There are two huge imports, 3dShapes (noderank #4) and BAG, which is spread across 16 accounts which all nicely have BAG in their name (Sander H_BAG, Commodoortje_BAG, etc.) All 16 are in the top 200 of noderank.

  • Massachusetts: The state GIS was a massive import, by account jremillard-massgis (noderank #10) and a few others. Amazingly, the road data is actually of high quality and needs very little cleanup; the wetland hydrography is a bit messier.

  • Some highly ranked accounts appear to be national imports (?) that I found harder to learn about, such as Tom_G3X (noderank #7, 19 Mnodes in Japan) and Petr1868 (noderank #9, who has apparently added 23 Mnodes to the Czech Republic using "Tracer Using RUIAN and LPIS")

  • France has many accounts importing from its national cadastre database, but it is very hard to tell which. One might guess that ËdzëronK (noderank #12) and the 15 other massive contributors to France in the top 100 are importing cadastre, but perhaps some of them are actually just amazing, really active mappers.

In my next post I'll talk about some non-import, real cool mappers I discovered.

Top OSM Rank: Who are these crazy, amazing people?

Posted by bdiscoe on 3 May 2015 in English (English)

It's now been around 2 years since I started editing OSM seriously. I've used Pascal's HDYC and YOSMHM to track my progress, with the goal of making a real contribution to OSM worldwide. One thing I always wondered about, as my OSM node rank went up. It would reach, for example, 300, and I would think, wow, I have been editing so much... who are these 299 people around the world who actually edit even more??

Recently, I set out to answer this question. I started looking at HDYC for well-known accounts, as well as their heatmaps, and gathering the results in a spreadsheet. When that got tedious, I wrote a C++ app on Osmium and ran it on the Planet.osm file, to find out the complete list of top-ranked accounts.

And the answer is... most of them are not actually people; a few are bots, and many are "import accounts", or user accounts that have been used for a large import at some point. (...but not all of them! Some are actual, live humans manually editing OSM longer and more extensively than me). Along the way, I learned some OSM history, and the diverse patterns in OSM in different countries.

Here is a link to the spreadsheet, sortable by rank, with my own notes on the where/what of around 400 accounts, including the top 100 in node and way ranks. The data is approximate... it's not auto-refreshed by a script (yet), so some ranks may be a little out of date.

In my next diary entry I'll share some of the stories and realizations I've had while gathering this data.

The story of the oldest node in OSM.

Posted by bdiscoe on 26 April 2015 in English (English)

I've been using Osmium, and today parsed the entire planet.osm.pbf for the first time. I noticed that the nodes are in order by ID, and the very first node, the oldest node still in existence, is node 10. Let's look at it!

This tough little node has had quite a history! Presuming that the database is accurate, this is what it tells us today:

  • v1, April 18, 2005, user sxpert creates this node in chageset 4. That's right, the fourth changeset ever. We have no record of its geographic location.
  • v2 was redacted.
  • v3, April 2009, super-user woodpeck (Frederik Ramm) places this node in London, near Regent's Park.
  • v4, September 2009, dtr20 deleted the node, as part of "Survey east of Regent's Park"
  • v5, April 2011 max60watt somehow re-uses the node, placing it near the bus stop in a quiet little village near the town of Kassel, in the German state of Hesse.
  • ... and that's where the node has stayed, through 3 small edits.

The name of the village is Furstenwald. As an English speaker, saying this name out loud causes me to giggle. Of all the nodes still alive today, the first in the world is in... Furstenwald.

(Actually) fixing the Peoria GIS import

Posted by bdiscoe on 11 April 2015 in English (English)

It turns out that Peoria is not just a metaphor, but a real place in Illinois. It is also the location of a rather messy GIS import of County data! Here's the history as far as I can determine:

  • The Peoria County Government gathered data resulting in a dataset as of 1997.
  • In 2010, that dataset was considered old enough to be considered "obsolete" which apparently justified uploading it to OSM.
  • A wiki page Peoriagisuploa describes most of the details of what happened in June 2010. Basically, it's woods and buildings.
  • Woods came in with natural=wood (but too many nodes)
  • Buildings came in with building=yes and BUILDING_T=(0..9) for a building type, as documented on the wiki page.
  • In July 2010, user account "xybot" applied some changes called "Correction of faulty peoria bulk upload" which did a very strange thing to the building tags. It changed "BUILDING_T" to "tiger:buildingType" (!) There is no such tag in TIGER (which has no buildings, let alone building types).

I studied this mess and figured out what should have occurred: mapping Peoria's BUILDING_T onto the actual, standard OSM building types:

  • BUILDING_T=1 -> building=residential
  • BUILDING_T=2 -> building=commercial (very few of these are industrial)
  • BUILDING_T=3 -> building=school
  • BUILDING_T=4 -> building=garage
  • BUILDING_T=5 -> building=static_caravan
  • BUILDING_T=6 -> building=industrial (there are almost none of these)
  • BUILDING_T=7 -> building=yes (it was under construction in 1997, it isn't now)
  • BUILDING_T=8 -> (make these the inner ways of multipolygon relations)
  • BUILDING_T=9 -> man_made=pier

I have been laboriously applying these fixes recently, and will finish soon. I'm doing it manually in JOSM, checking carefully, not only because that's the quality thing to do, but also to head off any claims of "mechanical editing". I'm also cleaning up the woods, which is not simply a matter of decimation but also a lot of manual updating because the woods are not where they were in 1997.

Go home coastline data, you are drunk

Posted by bdiscoe on 14 February 2015 in English (English)

I've been having a great time recently using Osmium to write my own analysis code in C++ to look for anomalies in the PBF extracts. Today it found this very strange coastline in South Africa:


Perhaps, i thought, this is some rare geological formation, that makes an amazing wavy line? So let's look at the data over aerial:


Uh.... what? I've seen a lot of weird and bad map data, like the mechanical grit of PGS and all the horrors of TIGER, but this was new. It's as if some cartographer said... "yeah, it's a coastline! Uh, what kind? Uh.... a wavy coastline? Yeah, wavy! Lots of waves.... I LOOOVE to draw WAVES, wheeeee!"

I should mention that this appears to go on for hundreds of kilometers.

The importer of this way is an "Adrian Frith" but it's most certainly not his fault, the source tags says "Municipal Demarcation Board" so it was probably made by some government department, or maybe a contractor that was getting paid by the node?

I'm sorry to say I'll be quickly tidying up this coast, so perhaps by the time you read this, you won't be able to see the waves at, for example, here. On the other hand, coastline changes are special and take a while to process, so the blue ocean wobbles will probably stay for quite a while.

Come work on #MissingMaps with me!

Posted by bdiscoe on 10 December 2014 in English (English)

The recent #MissingMaps project added to the Tasking Manager is a great way to work together on specific places!

However, some of the maps are sadly neglected. The "high priority" HOT places (like for ebola and cyclones) get a lot of contributors. But, other #MissingMaps have little work.

For example, #793 - Missing Maps: Bukavu, Democratic Republic of Congo was added 5 days ago and nobody contributed at all. I have begun, but it's kinda lonely. Come join me! The imagery is good, the infrastructure is easy to see, and the DRC has tons of unmapped detail. Come join the fun and MAP THE PLANET!

Ethiopia, Sudan, Nicaragua...

Posted by bdiscoe on 29 October 2014 in English (English)

Some recent work i'm proud of:

  1. Fixed the tags (and in some cases the boundaries) of all of Ethiopia's national parks, including Gambella, Bale Mountains, Awash, etc. I even added the Alatish National Park which was entirely missing.

  2. Nearby on the Ethiopia/Sudan border, improved the area where they are building the Grand Ethiopian Renaissance Dam on the Blue Nile.

  3. In Ethiopia's Afar province, added the newly-built Tendaho Irrigation Dam with its huge reservoir.

  4. In Sudan, improved the massive Khashm el-Girba Reservoir and nearby city of Al-Qadarif which needed lots of work.

  5. A large number of waterways in the wild eastern parts of Nicaragua (like here) and Honduras (around here), although sadly most of the streams aren't visible until zoom level 13.

  6. Just now, a complex relation for the Las Trampas Regional Wilderness, near San Ramon, CA, USA

Making JOSM faster with javascript keyboard shortcuts

Posted by bdiscoe on 12 March 2014 in English (English)

User interfaces are very much a matter of taste, so with the caveat that this is all really subjective...

In any graphical program, I find that I am most fast and fluid when I have my left hand on the keyboard (e.g. on ASDF) and my right on a mouse. It's best if all the key combinations I need are easily pressed with my left hand. If i have to move my left hand away, or take my right off the mouse, everything slows down.

So, with JOSM. The first thing I do is open the Preferences, under Keyboard Shortcuts and re-map Delete from the Delete key to 'D'. Now, for shortcuts for all the other common tags (highway=service, building=yes...), it's not simple, but it's possible. JOSM lets you map keys to presets, but those presets still open a dialog (extra steps). To program my own shortcuts, I dug into the scripting plugin (Javascript API). It's very nice, well-supported (thank you "Gubear"!) and I've only begun to explore what it can do.

Here is my script (install_custom_menus.js)

To use it, first enable the Scripting plugin in JOSM's plugin preferences. (You'll need the very latest JOSM, 6891 or later, and up-to-date plugins). Now, from the Scripting menu, open the "console", load the js file, and run it. If it works, you will then see 4 new items on your "Edit" menu.

You can now use Preferences: Keyboard Shortcuts to map keys onto them. I use:

  • T : Clear Tiger
  • Shift+T: Turning Circle / Track
  • Shift+S: Service
  • Shift+B: Building

With only basic familiarity with Javascript, you can easily modify the script to add your own commands, and then maps keys to them. You will need to run the script once, each time you restart JOSM, to add the menu items, but the shortcuts are persistent so you only need to set them once.

A word about responsibility. These are just shortcuts for things that JOSM already does, but although you can now do them faster, you still need to focus on quality and standard OSM practice. For example, for cleaning Tiger (in the USA): before I press 'T' to clear the Tiger "reviewed" tag, I visually confirm that the geometry of the road is correct, that the name is good, that cul-de-sacs have been set appropriately (Shift+T), and its good in every way. Only then should one clear that tag.

Happy editing!

The first 30-day challenge: retrospective

Posted by bdiscoe on 12 March 2014 in English (English)

The first Scout-Telenav 30-day OSM Mapping Challenge just ended. Let me share some of the story.

When it was announce February 11, I was excited. At that time I was already an "addicted mapper", and fairly sure of my fast-accurate JOSM editing skills, so I figured I could win it. The challenge was for the USA. I usually trace Bing in remote parts of the world, but I did know of a lot of roads in Hawaii that could be quickly cleaned up, so I figured that would give me a quick start.

Week 1

My Hawaii edits did produce a good number of points, but experienced Canadian mapper ingalls was in the lead! He was cleaning Tiger in Texas at an impressive rate. I was slowly catching up, but he remained ahead.

Week 2

Suddenly, when ingalls and I were both at ~30k points, he stopped mapping. I breathed a sigh of relief and took the lead. I found myself doing too many steps in JOSM while editing, and started wondering if I could set up keyboard shortcuts that would let me go faster...

Week 3

Just when I seemed safely in the lead, a user ada_s appeared in the rankings and rapidly went up to second place. All their edits had the same comment, "Add address information + split way when exiting the city border" That seemed like an odd thing to do, but it sure racked a lot of points. I struggled to find enough time to stay ahead (I do have a full-time job and girlfriend) and ada_s continued to gain. At this point, my exploration of the JOSM scripting engine produced some results - I was able to create a lot of single-key shortcuts (like Shift+S, set highway=service) that let me go faster (more about those scripts in my next diary entry). I was working faster now, but ada_s was still gaining on me.

Week 4

I pulled a couple late nights editing, which put me at 57k points but ada_s was at 50k and picking up speed. After another day where our scores both leapt up, I finally took a look at exactly what ada_s was doing. They were putting "addr" and "in_in" tags ... on highways. Like, every single road and driveway in Lincoln Nebraska was tagged with "addr:city=Lincoln" and "addr:state=NE". This seemed very odd to me (not to mention useless), so I took a look at the page for addr and sure enough, it doesn't say anything about using it on highways (because, why would you?) I sent ada_s a note asking politely why they were adding those addr tags. I also put in a few changesets removing those same tags from a few cities where ada_s had added them (along with other improvements). I then found a particularly messy Tiger region in South Carolina, and dug into it for another late night, my JOSM edits now at great speed. ada_s never responded but they did, suddenly, stop editing. (Maybe they just didn't know that those tags were useless and nonstandard? It could have been innocent.) They were up to 72k by then, but partly due to undoing their odd tags, I was at 108k. I pulled one more late night then stopped myself. My final score was 145k, ada_s at 72k, followed by quality good editors like "rickmastfan67" and "jonesydesign" at 40-50k.

Conclusion: Having a contest to make the most "edits" does risk people going for questionable things that touch a lot of ways. Perhaps 55k of ada_s's points were in that category (and hence 55k of my own score undoing them, so my real score should be around 90k; still in first place but not crazy). However, I'm certain that the contest did inspire a big increase in overall quality editing. I certainly got a lot faster, learned JOSM better (and spent time improving the USA, where I usually wouldn't bother)

They're doing the contest again ("with simplified rules-and more prizes to win") and that seems like a good thing to me. I won't be entering next time (to give you all a chance :-) and I'll be sharing my JOSM extensions in my next post. My main interest is in getting everyone more productive at editing, for the greater good of OSM.