Recent diary entries
The TIGER import in the USA is one of the largest and messiest imports in OSM, but of course it is getting cleaned up over time. It can be hard to estimate progress, but one rough metric is the number of nodes and ways that haven't been changed since import. This number goes down over time. For the past 3 years, I've been tracking these numbers in a spreadsheet.
- For nodes, it is the last-modified-by for accounts 'woodpeck_fixbot' and 'TIGERcnl'
- For ways, it is the last-modified-by for accounts 'bot-mode' and 'DaveHansenTiger'
Over the last 3 years (since June 2015):
- Nodes have decreased from 139.6 to 127.6 million (average 11k/day)
- Ways have decreased from 8.00 to 6.08 million (average 1688/day)
What's remarkable to me, as you can see from the trendlines, is how steady the rates are. At this rate, all of TIGER won't be cleaned up (or at least touched) for another 31.7 years (for nodes) or 9.9 years (for ways).
Some observations, in nodes:
- Despite another round of my vigorous cleanup in Ontario, the 'CanvecImports' account only dropped by 600k from 39.5 to 38.9 Mnodes. Still so much Canvec to clean up!
- Import accounts that were active include: 'StefanB_import' +14, 'Svein Olav' +119, 'osmviborg' +132, and 'Rúdisicyon' (+434!) which is doing a giant import of buildings in Portugal.
- Other accounts that moved up in rank, which are probably due to good active mapping and not imports, included 'indigomc' +14, 'santamariense' +11, 'Kohki Hiraga' +15, 'Hernan' +18, 'yunita sari' +28, 'Ben97' +40, 'hpduwe' +13, 'yuantouniaoren' +65, 'jpgon' +14, 'ajithkarunaratne' +25, 'dubf' +48, 'Hendrikklaas' +23, 'baradam' +12, 'chachafish' +19, 'Sander H' +17, 'DaCor' +10.
- HOT/MissingMaps active users 'asmi84' +59, 'ASHIQ MAHAMUD' +31, 'Raven Nahid' +77, 'anisa berliana' +13, 'Nodia' +567, 'Abou kachongo jr' +52, 'dianawa_22' +11
In ways, I notice some of the same activity, and also:
- 'kiaraSh-Q' who maps mostly Iran, somehow moved down in nodes (-59) but up on ways (+41).
- Mysterious drops 'Canyonsrcool' -52 and 'alester' -36.
- 'Matthew Darwin' (+22/+23) seems to be impressively fixing/aligning every road and address of Ottawa, Canada.
- Some Japan-import-account drops are probably due to the cleanup work that myself and other Japanese users have been doing there, hence 'chnkshm' -7/-17, 'KSJ2_adm_bnd_imprt' -16/-283, 'nyampire' -14/-21, 'watao' -5/-105.
- User 'PierZen' (+4/+539) seems to be doing a bunch of cleanup in Quebec, including boundaries and removing superfluous tags.
- Other active global users who moved up in ways include 'vichada', 'de vries', 'Seandebasti', 'JFK73', 'edvac', 'danbjoseph', 'Alan Bragg' and 'gscholz' (+3/+57).
I've run my find_small_displacements program on Japan, and found some problematic imports with a large number of densely overnoded features, here are some:
Bad import of natural=wood around Fukuoka. These were tightly spaced, yet also wildly inaccurate, off by as much as 80 meters and covering lots of non-wood areas, even covering motorways. I've reduced most of them, and aligned a few areas, but there's lots more to align. The original upload was in 2010 with changesets like this one.
Bad waterways, especially in Hokkaido and Kyushu, but also across the country; these are not just overnoded but also overtagged, consider this little stream near Taketa, Ōita:
- note=National-Land Numerical Information (River) 2006, MLIT Japan
Issues with this little waterway:
- Poorly aligned; off by 20-60 meters in all directions.
- A large number of tags that belong on the changeset, not the feature; to its credit; this wasn't a practice back in 2010 when it was imported.
- Overnoding, using 1085 nodes for what is well represented by 40 nodes.
- It's tagged "river", but is clearly a small stream.
- "layer=-1" for every single waterway.
Also somewhat alarming, JOSM does not show all the tags (!?), the "KSJ2:" tags are present on the feature but do not appear in JOSM's GUI. I could not find any option in JOSM to make these key/values show up! Perhaps a JOSM expert could weigh in, or I should file a ticket.
To follow up from my previous post, I did some further work on generating and putting online a table of OSM node/way ranks
The data that's there right now is from today (2018-03-01) and the deltas are vs. 2 weeks ago (2018-02-12).
Standard disclaimer: Last-modified-rank is only vaguely related to contribution, there is no way at all to measure actual quality or value of contribution across users, because it's subjective, and users are very different from each other. However, this table can be very useful for an individual mapper to see how their amount of contribution changes over time, and to identify, for example, accounts that are moving up rapidly which usually indicates they are doing an import. Similarly, if your rank moves down, it can mean that someone (correctly or not) has modified or deleted your mapping work.
For those curious about the technical mess that's currently involved, here is what I did:
- Download of the weekly planet file from Planet OSM (39 GB), this takes around 12 hours.
- Running a small Linux C++ app that uses Osmium to parse the pbf and generate a CSV of users along with number of nodes and ways that they are the last modifier of.
- On Windows, running SQLiteStudio to ingest that CSV as a table in a database.
- Run a C++ app that uses SQLite to query the database and generate the HTML output.
- FTP that HTML up to a server.
As I've been mapping heavily since 2013, I've tried various ways to track my progress. It's really great to feel that your mapping is making a visual and statistical difference! However, as of today, there is no good metric, and it's very frustrating. Understandably, there is no way at all to measure actual quality or value of contribution across users, because it's subjective, and users are very different from each other. However, for me, I know that I map at a consistent quality and node density, so I should at least be able to measure my progress with respect to myself! Here are things I've tried:
Looking at my Heat Map. At first I thought, this is great! A clear visual indication of how much of the world I've contributed to, and a clear goal to cover the world! But, by 2014, I observed many edits weren't counted, confirmed in an email exchange with it's author, Pascal. To its credit, YOSMHM's goal is only to give a rough idea of where a user has edited, and it does that very well. However, once you've edited for a while, you can add a thousand nodes and see nothing on the map change. That's just frustrating.
Looking at the "last modified" nodes and way on HDYC. This was great, I could do a busy night of editing and the next day, HDYC would show my "last modified" nodes went up by, for example, 10K nodes. It gave a good indication of how much I was contributing. Sadly, Pascal changed the website around November 2017 so it no longer shows "last modified". That's frustrating.
Refreshing the map after doing major edits. This used to be possible by right-clicking the main OSM.org map's tile and submitting it to the "dirty queue", so it would be re-rendered and you can see all your work. This gave quick visual feedback and confirmation and an encouraging sense of accomplishment! Sadly, OSM.org changed how their map works (some blame OpenLayers?) so now you can't get tile URLs and can't request re-rendering, just wait for several days or more for the server to eventually re-render. That's very sad.
Looking at the #MissingMaps leaderboard you could see your "Total Edits", "Buildings", "Km of Road" numbers, and watch them go up in nearly real time! It used to be a little flaky (ignoring the occasional changeset, which was frustrating), but since December 2017, it now fails to count most changesets. I can submit, for example, 10 #MissingMaps changesets in a day and only see 1 of them counted, the rest ignored. That's beyond frustrating.
In regards to #2 (HDYC), I did write a small C++ program (on my bitbucket) using Osmium to read the OSM Planet file and count the last-modified for each user. But, this is awkward, for several reasons:
The planet file is huge (39 GB!), and I have yet to automate, so it's a long slow manual download.
The planet file is only updated once a week now. I have no idea how HDYC updates every day!
I haven't written the code yet to format the results into a nice, sortable web layout, or to integrate it with my existing spreadsheet of user notes.
So, that's the state of things. All the ways that used to exist, to get visual or numerical feedback or progress metrics, are gone. In my opinion, it's not just a personal frustration, but a lost opportunity for OSM as a movement, that we are missing this simple way for mappers to get encouragement and acknowledgement.
Emmor (account Palolo) asked some questions in an OSM message, I've put my response here as it may be of general interest.
On 2016-09-26 22:32:47 UTC Palolo wrote:
Ben, Thanks for your contributions to OSM, especially for the rivers you have cleaned up on the west coast. I also just came across your spreadsheet tracking users and find it fascinating.
I was wondering if you could convert your notes into 3 categories of mappers: 1) Imports, 2) Mappers, 3) Combination mapper/importer ?
That's a good question, I've considered trying it, but it can be difficult to tell them apart, or it requires individual detective work that I just haven't gotten around to. Generally, for the import or part-import accounts, I've put the title of that import in the "Grouping" column.
If they have added millions of nodes, and there is nothing about importing in the "Where, What" or "Grouping", it means I haven't been able to figure out if they are an import account or not. For example the Japanese accounts, "Tom_G3X" and "ikiya" and "yamasan". They are probably imports(?)
I've also put the account name in bold (like katpatuka and Heinz_V) if they have contributed millions of features without any obvious importing. Anyone who belongs in this category that I've missed, please let me know!
Also have you thought about gender classification?
I've thought about it, but it is also very hard to tell. Very few account names/images are clearly gendered, and nearly all those that are, by name or image, appear male.
It appears to me that there are very few female top contributors. I wonder why this is since it is open for anyone to edit.
Probably for the same reasons that cartography and technology in general is so male, cultural bias encourages it for men and encourages other things for women.
The top-ranked female account that I know of is "ediyes", a Mapbox mapper at #137/88 (more than 88K changesets!). However, it's entirely possible that some of the mysterious accounts in the top 100 are female. In the top 1000, there are many, including other female mapboxers (dannykath, karitotp, samely...) and the Queen of #MapLesotho, tshedy.
One editor that has been super active over the past 9 months is "Aiko Nakata", which is a Japanese female-only name. Also "Febrina Dewi" and "Fatisya Ilani Yusuf" and "asti_shinoda", all women I believe, were the top-ranked contributors to #MissingMaps last year, all from Indonesia, an amazing amount of mapping work.
When I read Michal Migurski's recent post robots, crisis, and craft mappers, I was really baffled and concerned. I am a fan of Migurski; he's a good person and a smart guy. But the content of this particular blog post was really off. I had hoped it would pass with little notice, but I can tell from the #craftmapper T-shirts at SOTM that people actually paid attention, so sadly I feel compelled now to rebut, and hopefully offer some useful perspective as well.
To get something out of the way first, I am absolutely a "armchair" or "craft" mapper, and an addicted mapper, averaging ~5 hours a day mapping for the past 3.5 years; by my own estimation, there are only two human OSM accounts (katpatuka and Heinz_V) with more node/way contribution. (Also, shoutouts to AndrewBuck, Stalker61 and ulilu!) I care passionately about the map, I've been in geo since the 90s, and I've been inside Google to see how mapping actually happens at scale.
To start with, he writes:
The OpenStreetMap community is at a crossroads
Arguably, no it isn't. It is actually on a stable trajectory, with no major shifts likely.
I see three different movements within OpenStreetMap: mapping by robots, intensive crisis mapping in remote areas, and local craft mapping where technologists live
Actually, no. "Robot" mapping is a perennial project of AI zealots, not a movement, and cannot and will not produce acceptable data (for reasons way beyond the scope of this rant). At best, it is another way to produce yet more controversial imports of dubious quality. Crisis mapping is now well-established for many years, not a new or dynamic trend; same with local or remote "craft" mapping, i.e. normal OSM contributors; not a movement, and not new.
The first two represent an exciting future for OSM, while the third could doom it to irrelevance.
This is saying that normal OSM contributors, the ones that have and continue to build most of the map - and the great majority of the quality map - are "irrelevant". This is really, 100% wrong.
Historically, OpenStreetMap activity took place in and around the home areas of OSM project members
True enough, and that is still the single largest source of quality map contributions. The other parts are imports, a small amount of commercially-sponsored input, and armchair mappers like myself, tracing aerials from the places that can't (or can't yet) map themselves, either for HOT or MissingMaps or beyond. Together, that IS OSM, past and present, and unless Something Dramatic happens, that is also OSM's future.
Craft mapping remains the heart of the project, potentially due to a passive Foundation board who’ve let outdated behaviors go unexamined.
I am trying to figure out how to not feel hurt by this. "OUTDATED." The passion that drives the entire past, present and future of OSM is "outdated?"
Left to the craft wing, OSM will slide into weekend irrelevance within 5-10 years.
That's basically saying that OSM is irrelevant today. As an opinion, that's a pretty harsh one.
Two Modest Proposals (1) codes of conduct and other mechanisms intended to welcome new participants from under-represented communities
This sounds fine, but it seems orthogonal to the "robot, crisis, craft" framing. It seems uncontroversial to empower and support more crisis/craft mappers from under-represented communities.
(2) the license needs to be publicly and visibly explained and defended for the benefit of large-scale and robot participants
I have sat out the license wars, partly because, as a regular non-lawyer human, I cannot fathom what all the fuss is about. That said, it also seems unrelated to crisis/craft mappers, with or without AI-robot assistance to produce data for human review, who will surely be able to proceed with or without license changes.
I could say much more about this, but much has already been hashed out of the comment thread on the original blog. For example, "automation vs. craft is a strawman argument; Both - in an integrated manner!" yes obviously.
Instead, I'd like to provide an answer the question I believe Migurski is actually asking. I believe he is saying:
- While better in some areas, OSM isn't on par, for the full range of uses, with maps from Google/Apple/etc.
- The existing approaches aren't on a trajectory to get us there, therefore they "doom us to irrelevance".
- We need something more to get us there, but what is it (robots? codes of conduct? license changes?)
The answer to this question is obvious, but everyone seems to be waffling and dodging it. I will say it: MONEY.
To be a top-tier global map, it takes roomfuls of full-time, paid mappers, with the kind of resources and coordination that (realistically) are only found in large corporations.
Clickshops. Google has them, Apple has them, any organization that wants to take OSM to the "next level" will need them. In some developing nation (for cost), with fast computers and fast networks and thorough, regularized training for speed and consistency. (In case someone is thinking Mapbox, that's nice, but think bigger. Think 100x.)
Streetview. Every station in Google's clickshops has the entire catalog of streetview instantly available, continuously integrated into the mapping flow. Without a streetview-like dataset, you just can't do it. I know Mapillary (+JOSM plugin) is trying, but they are not even close - you have to capture FULL 360 (cylindrical) imagery, not just hope that hobbyists were pointing their camera where you need to to look, and you need the RESOLUTION to read street names. Not even 1% of mapillary users are capturing HD 360 imagery. You can't do it with prosumer cameras (I've tried). You need an expensive rig. Stop pretending otherwise.
Some company or consortium (or, in theory, government, but I'm not holding my breath) could step forward with MONEY and take OSM to that "level III/IV" Migurski (and many others) would like to see. Barring that, everyone needs to extend love to the homebrew/crisis/craft/mapathon mappers we have, because we ARE OSM's future.
My first big river, in 2014, was the Klamath. At first, I tried looking for it using the OSM search box (Nominatim). All I found was a mess of missing river parts, and when I looked closer, I found poorly imported NHD, very old and wrong riverbanks, incorrect tagging, etc. I spend a few days to fix it up and produce the Klamath River waterway relation:
Since then, I've done similar work for other waterways. Sometimes the relation exists but is incomplete, other times I create it; either way, it can take days of work to finish. Here are some in the USA:
- Housatonic River
- Black River (NC)
- Denver’s High Line Canal
- Sacramento River
- Iroquois River
- Rogue River
Recent work in Illinois:
Outside the USA:
I've been contributing heavily to the #MapLesotho project for a while, and we're making great progress on all the basic geometry of the country, like roads, paths, buildings, waterways. A good OSM map of a place has more than that, it has things like POI and amenities, which are hard for an armchair mapper like me to help with. One thing I can do, however, is protected areas. Lesotho is a small country with only a few protected areas, two small "national parks" and some even smaller places. There was no vector source for any of them (neither open nor closed), but by doing a bunch of online research and detective work, and reading wikipedia, I was able to add the two largest:
- Sehlabathebe National Park relation ( and on wikipedia )
- Ts'ehlanyane National Park relation ( and on wikipedia )
While watching the OSM ranks and daily activity of the top thousand accounts, I've noticed a curious pattern: a number of accounts which all edit in the same places, the very largest cities in India (New Delhi, Mumbai, Bengaluru, Hyderabad, Kolkata, Chennai...) Their edits are so similar that the accounts must be connected, as if they are all working for a company, school, or other organization. However, their account pages are blank (unlike, for example, Mapbox accounts, which plainly state their affiliation).
Here are the top "Urban India" accounts: saikumar, premkumar, jasvinderkaur, anthony1, Apreethi, sdivya, shekarn, himalay, Navaneetha, praveeng, harisha, himabindhu. I know of at least 20, but I think there are many more. Many (most?) of the accounts have been active since sometime in 2015.
I am very happy that someone is coordinating work on India's cities, but I would be very curious to know who it is! After all, companies like Google have whole buildings full of workers (in India) to update their proprietary maps. Is someone doing the same for OSM? I'd like to thank them.
Here is a picture of premkumar's edits; all the accounts look very similar to this:
Particularly in Africa, there are huge number of small round buildings. I believe the best way to model them is with a single node, tagged building=hut; optionally with a radius or width. However, the dominant OSM practice is to use a whole way with lots of nodes. So, I must go along with the community. Unfortunately, I encounter a lot of this while working, for example on #MapLesotho
That's 7 huts, 130 nodes, all the wrong size, some overlapping - likely the result of very bad copy and paste. (There are actually 8 huts there; they missed one.)
My fingers have developed the rhythm for cleaning this sort of thing up in JOSM. First set the simplify-way.max-error to something appropriate (0.3), then for each hut:
- Click to select buliding.
- 'G' to unglue it from the surrounding buildings, if needed.
- Drag to center it correctly.
- Ctrl-Alt-drag to scale it down to the correct size.
- Shift-Y to reduce nodes.
- 'O' to re-round the circle.
Within a few seconds I have this:
8 huts, 86 nodes. It would be probably hard to automate this further without computer vision (because of the manual visual alignment), but using the standard JOSM position of left hand on keyboard, right on mouse, it can go very quickly with the above steps. One possible optimization might be to combine Shift-Y and O into a single keypress i can reach with moving my left hand.
Needless to say, if you are setting up a mapathon for people to map Africa, PLEASE teach them how NOT to make the bad big huts like the above, so I don't have to fix so many thousands of them.
As mentioned in my last entry, I wrote a tool using Osmium to parse PBF and look for inefficient ways, i.e. ways that if you ran simplify on them, would drop hundreds of nodes and not change shape. I'd been running it on small countries and US states, but this evening I tried it out on a PBF of all of North America, and here is the prize-winner for the most bloated, wasteful way: a small dirt road between some houses and the coastal wetlands in Nova Scotia, Canada:
That's 2000 nodes, or one every 5.6 centimeters.
By the time you read this, I'll have cleaned up way 85927697, but I'd also like to offer to anyone else, if you are an experienced editor who focuses on a specific part of the world, if you would like me to run my tool on the extract for your region, I can send you a list of the worst ways and you can clean them up. Let me know!
Also, a word to importers: Please, please make sure you check your data for this kind of mess BEFORE you upload. In this case it was Steve in Halifax importing CanVec in 2010, but similar things are being uploaded all around the world, all the time (I know because my tool finds them!)
Some months ago, I was looking around OSM to find where the bulk of noise and inefficiency is. I'm aware of some other efforts (like Toby in 2013) but I actually went so far as to write a C++ app on Osmium which parses PBF extracts, simulates running line simplification, and produces a list of the ways which are the least efficient.
I ran this on various US states and countries worldwide, and the winner is... North Carolina. It is so wildly inefficient that we may as well not bother with the rest of the world until we've cleaned up North Carolina first. (Just for comparison, the size of the output: Finland 17k, Colombia 20k, Colorado 30k, England 45k, North Carolina >300k).
Why is North Carolina (henceforce NC) so obese? There are a handful of bad spots elsewhere (like some of the Corine landuse in Europe, and a waterway import in Cantabria, Spain) but nothing close to NC. It's due almost entirely to a single import in 2009. The USA's hydrography, NHD is a truly massive dataset. An account called "jumbanho" imported NHD for NC and apparently applied almost no cleanup (beside a small pass at removing duplicate nodes a few months later). Among the many flaws of that import:
- Topology is mostly missing (features meet but don't share a node)
- Really out of date (shows swamps that were drained decades ago, streams running through what are now shopping malls).
- Almost all of it is barely or not at all decimated (a stream which is perfectly modeled in 15 nodes is sometimes made of 300 nodes).
As a result, the jumbanho account has noderank #3 with 43 Mnodes (this was rank #2 with 49 Mnodes, but as I'll explain, I've been busy).
This is what the data looks like:
As you can see, a regular set of evenly-spaced nodes, with no decimation. This is worse when you consider that the overall accuracy is far less: here the pond is 7m off, the stream is variously 12, 14, or 32m off:
This inefficiency is bad in a few ways, such as making the planet file balloon with dead weight. But a more relevant issue is this: When a user comes in here to fix the alignment of the data, there is NO WAY they can be expected to move all 200 points by hand. An import with too many points is highly resistant to EVER getting manually fixed. The solution is to simplify first, but by how much? Here we encounter some issues:
- The simplify tool in JOSM defaults to 5 (!) meters which is brutal and useless for just about any use I can think of (maybe very, very rough old GPS traces?)
- JOSM lets you change the amount, but it is buried deep in the "advanced preferences."
- Once you find that, knowing how much to simplify each kind of feature is a matter of experience and skill.
After hundreds of hours of manual work on NC, I have learned what values work; general guidelines which I carefully tweak based on each area:
- natural=wetland. These are very rough, 1.0-1.2 m.
- waterway=stream, waterway=riverbank, natural=water. They are more delicate, I use 50-80 cm.
- Streams and rivers which are either inside wetlands, or "artificial paths", these are often notional and don't correspond closely to any real feature, so >1 m.
Note that these levels of precision are WAY less than the actual inaccuracy of the data; they cannot harm the value in the data, because they are too small. In fact they could be bigger, but the goal is to leave enough nodes so that human editing won't have to add or remove many nodes when they align the feature to its correct location.
While I would be happy to just write a bot to do that first step, that would be a "mechanical edit" and I'd have to put up with mailing list arguments to get permission. (I'd also have to write that bot, which I've been too lazy to do so far). So instead, I've put in the time to do it all manually in JOSM, with steps like:
- Study each area, compare the features to the imagery.
- Do some super careful simplify with appropriate values. (It gets really tiring having to dig into JOSM's advanced preferences every single time I change the value.)
- Fix the topology by carefully tuning the validator's precision and allowing it to auto-fix, with manual verification.
- Some manual adding of bridges and culverts.
- Removing/updating non-existent wetlands and streams (one common clue: they intersect buildings).
- Splitting some ways and creating relations, for example for a large riverbank and wetland that share an edge.
Here is that same area after a simplify to 70cm on the NHD features, then quick manual alignment:
It's exhausting. In fact, a bot wouldn't really help that much, since the simplify is only the first step, the topology and the rest still need to be done by a human anyway.
By my rough calculation, if I work hard for 5 hours every night, It would take around 5 months for me to finish cleaning up NC NHD to a decent level.
On the plus side, other NHD imports I've seen around the USA (like Oklahoma) don't seem to be nearly as bad; while they suffer from most of the same quality issues, at least they were already simplified before uploading.
I love YOSMHM. Pascal has done a great job with it, and it's very cool. In fact, given my desire to map the entire world, I rely on YOSMHM to tell me where to map next. And just recently, it's improved from updating weekly/monthly to daily! But, there are some limitations.
- One blob per changeset. If I map a long highway, a dot at the center of the highway really doesn't show where I mapped.
- Missing data. I have around 12,100 changesets, but YOSMHM only shows 9356. That might explain why it doesn't show the mapping I did in southern Chad, or eastern Cameroon, or Agadez in Niger, or many other places.
So, I set out to see if I could make my own heatmap. Here are my first steps.
- Thanks to a great answer from EdLoach it was easy to get XML files for all my changesets.
- I parsed those XML to get the extents (min_lat min_lon max_lat max_lon) of each changeset.
- I tried a number of different web-heat-map tools, and settled on Leaflet + Leaflet.heat because it was super easy to use. I just pass the center of each changeset's extents to Leaflet.heat as a point, and the result looks like this.
Finally, I can see at least some blob in every part of the world I've mapped. Unfortunately, unlike YOSMHM, all the changesets are weighted equally (it would take a lot more querying and parsing to weight them) so that, for example, it's hard to tell that I've done 10x more mapping in Namibia than in Japan.
I can dream of a better heatmap! It would have:
- It should show added/modified/deleted in different colors (like green/blue/red) so I can quickly see what places I've done more correction, vs. adding new features. India, Africa and Central America are the only places I've added huge amounts of detail to OSM, but you can't tell that by looking at YOSMHM. Is there a better/faster/more polite way to get all that detail without making 12,100 queries to the OSM API? I can't just parse the planet file, because that only has current state, not history.
- Not damn Mercator. Anything else would be better. How about Goode Homolosine? Can I get "free" background tiles (like Mapbox's) served in anything beside "web mercator"?
Not the entire city, but the place node, for Hyderabad, a city of over 7 million people... which currently has no label.
See the unlabeled city here: https://www.openstreetmap.org/#map=10/17.3382/78.5502
It seems unlikely that there never was a label, which means that somebody probably deleted it accidentally, or otherwise accidentally changed it in some way which prevents it from appearing (e.g. change "place=city" to "place=City") It is also missing from Nominatim. Is there no bot or other process checking for when something huge like this disappears from the map?
TIGER! 'DaveHansenTiger' originally imported TIGER, but 'woodpeck-fixbot' (noderank #1) subsequently touched nearly every node. Because TIGER is such a mess, it may be possible to estimate how quickly it is getting cleaned up based on the last-modified count of woodpeck-fixbot. Currently it's 136 M, going down at around 12 K/day, so at this rate it will take 32 years to clean up all the TIGER in the USA.
TIGER ways: between 'DaveHansenTiger' and 'bot-mode', there are around 8 M imported TIGER ways that haven't been touched since import. At the current rate of 1800/day, it's going to take 12 years to clean it all.
NHD! (USA national hydrographic dataset). A lot of NHD was imported without any decimation at all, resulting in >90% of the nodes being redundant, effectively noise. There are at least 6 accounts involved in NHD import, including 'jumbanho' (noderank #2) and 'nmixter' (noderank #5). I've tried manually cleaning up this NHD mess manually, but it takes several hours to do 100 K nodes in JOSM. At that rate, it would take me 8 months of editing every night to clean up all 46 M nodes.
Canada! The CanvecIimports account (noderank #3) is at 45 Mnodes and still rising, and there are several more accounts that appear to import Canvec like azub (noderank #11), bgamberg (noderank #13). Some areas are neatly decimated and tidy, some are not.
Netherlands: There are two huge imports, 3dShapes (noderank #4) and BAG, which is spread across 16 accounts which all nicely have BAG in their name (Sander H_BAG, Commodoortje_BAG, etc.) All 16 are in the top 200 of noderank.
Massachusetts: The state GIS was a massive import, by account jremillard-massgis (noderank #10) and a few others. Amazingly, the road data is actually of high quality and needs very little cleanup; the wetland hydrography is a bit messier.
Some highly ranked accounts appear to be national imports (?) that I found harder to learn about, such as Tom_G3X (noderank #7, 19 Mnodes in Japan) and Petr1868 (noderank #9, who has apparently added 23 Mnodes to the Czech Republic using "Tracer Using RUIAN and LPIS")
France has many accounts importing from its national cadastre database, but it is very hard to tell which. One might guess that ËdzëronK (noderank #12) and the 15 other massive contributors to France in the top 100 are importing cadastre, but perhaps some of them are actually just amazing, really active mappers.
In my next post I'll talk about some non-import, real cool mappers I discovered.
It's now been around 2 years since I started editing OSM seriously. I've used Pascal's HDYC and YOSMHM to track my progress, with the goal of making a real contribution to OSM worldwide. One thing I always wondered about, as my OSM node rank went up. It would reach, for example, 300, and I would think, wow, I have been editing so much... who are these 299 people around the world who actually edit even more??
Recently, I set out to answer this question. I started looking at HDYC for well-known accounts, as well as their heatmaps, and gathering the results in a spreadsheet. When that got tedious, I wrote a C++ app on Osmium and ran it on the Planet.osm file, to find out the complete list of top-ranked accounts.
And the answer is... most of them are not actually people; a few are bots, and many are "import accounts", or user accounts that have been used for a large import at some point. (...but not all of them! Some are actual, live humans manually editing OSM longer and more extensively than me). Along the way, I learned some OSM history, and the diverse patterns in OSM in different countries.
Here is a link to the spreadsheet, sortable by rank, with my own notes on the where/what of around 400 accounts, including the top 100 in node and way ranks. The data is approximate... it's not auto-refreshed by a script (yet), so some ranks may be a little out of date.
In my next diary entry I'll share some of the stories and realizations I've had while gathering this data.
I've been using Osmium, and today parsed the entire planet.osm.pbf for the first time. I noticed that the nodes are in order by ID, and the very first node, the oldest node still in existence, is node 10. Let's look at it!
This tough little node has had quite a history! Presuming that the database is accurate, this is what it tells us today:
- v1, April 18, 2005, user sxpert creates this node in chageset 4. That's right, the fourth changeset ever. We have no record of its geographic location.
- v2 was redacted.
- v3, April 2009, super-user woodpeck (Frederik Ramm) places this node in London, near Regent's Park.
- v4, September 2009, dtr20 deleted the node, as part of "Survey east of Regent's Park"
- v5, April 2011 max60watt somehow re-uses the node, placing it near the bus stop in a quiet little village near the town of Kassel, in the German state of Hesse.
- ... and that's where the node has stayed, through 3 small edits.
The name of the village is Furstenwald. As an English speaker, saying this name out loud causes me to giggle. Of all the nodes still alive today, the first in the world is in... Furstenwald.
It turns out that Peoria is not just a metaphor, but a real place in Illinois. It is also the location of a rather messy GIS import of County data! Here's the history as far as I can determine:
- The Peoria County Government gathered data resulting in a dataset as of 1997.
- In 2010, that dataset was considered old enough to be considered "obsolete" which apparently justified uploading it to OSM.
- A wiki page Peoriagisuploa describes most of the details of what happened in June 2010. Basically, it's woods and buildings.
- Woods came in with natural=wood (but too many nodes)
- Buildings came in with building=yes and BUILDING_T=(0..9) for a building type, as documented on the wiki page.
- In July 2010, user account "xybot" applied some changes called "Correction of faulty peoria bulk upload" which did a very strange thing to the building tags. It changed "BUILDING_T" to "tiger:buildingType" (!) There is no such tag in TIGER (which has no buildings, let alone building types).
I studied this mess and figured out what should have occurred: mapping Peoria's BUILDING_T onto the actual, standard OSM building types:
- BUILDING_T=1 -> building=residential
- BUILDING_T=2 -> building=commercial (very few of these are industrial)
- BUILDING_T=3 -> building=school
- BUILDING_T=4 -> building=garage
- BUILDING_T=5 -> building=static_caravan
- BUILDING_T=6 -> building=industrial (there are almost none of these)
- BUILDING_T=7 -> building=yes (it was under construction in 1997, it isn't now)
- BUILDING_T=8 -> (make these the inner ways of multipolygon relations)
- BUILDING_T=9 -> man_made=pier
I have been laboriously applying these fixes recently, and will finish soon. I'm doing it manually in JOSM, checking carefully, not only because that's the quality thing to do, but also to head off any claims of "mechanical editing". I'm also cleaning up the woods, which is not simply a matter of decimation but also a lot of manual updating because the woods are not where they were in 1997.
I've been having a great time recently using Osmium to write my own analysis code in C++ to look for anomalies in the PBF extracts. Today it found this very strange coastline in South Africa:
Perhaps, i thought, this is some rare geological formation, that makes an amazing wavy line? So let's look at the data over aerial:
Uh.... what? I've seen a lot of weird and bad map data, like the mechanical grit of PGS and all the horrors of TIGER, but this was new. It's as if some cartographer said... "yeah, it's a coastline! Uh, what kind? Uh.... a wavy coastline? Yeah, wavy! Lots of waves.... I LOOOVE to draw WAVES, wheeeee!"
I should mention that this appears to go on for hundreds of kilometers.
The importer of this way is an "Adrian Frith" but it's most certainly not his fault, the source tags says "Municipal Demarcation Board" so it was probably made by some government department, or maybe a contractor that was getting paid by the node?
I'm sorry to say I'll be quickly tidying up this coast, so perhaps by the time you read this, you won't be able to see the waves at, for example, here. On the other hand, coastline changes are special and take a while to process, so the blue ocean wobbles will probably stay for quite a while.