Here are some of the things I learned while studying the OSM accounts with high HDYC rank, as described in my last entry

  • TIGER!DaveHansenTiger’ originally imported TIGER, but ‘woodpeck-fixbot’ (noderank #1) subsequently touched nearly every node. Because TIGER is such a mess, it may be possible to estimate how quickly it is getting cleaned up based on the last-modified count of woodpeck-fixbot. Currently it’s 136 M, going down at around 12 K/day, so at this rate it will take 32 years to clean up all the TIGER in the USA.

  • TIGER ways: between ‘DaveHansenTiger’ and ‘bot-mode’, there are around 8 M imported TIGER ways that haven’t been touched since import. At the current rate of 1800/day, it’s going to take 12 years to clean it all.

  • NHD! (USA national hydrographic dataset). A lot of NHD was imported without any decimation at all, resulting in >90% of the nodes being redundant, effectively noise. There are at least 6 accounts involved in NHD import, including ‘jumbanho’ (noderank #2) and ‘nmixter’ (noderank #5). I’ve tried manually cleaning up this NHD mess manually, but it takes several hours to do 100 K nodes in JOSM. At that rate, it would take me 8 months of editing every night to clean up all 46 M nodes.

  • Canada! The CanvecIimports account (noderank #3) is at 45 Mnodes and still rising, and there are several more accounts that appear to import Canvec like azub (noderank #11), bgamberg (noderank #13). Some areas are neatly decimated and tidy, some are not.

  • Netherlands: There are two huge imports, 3dShapes (noderank #4) and BAG, which is spread across 16 accounts which all nicely have BAG in their name (Sander H_BAG, Commodoortje_BAG, etc.) All 16 are in the top 200 of noderank.

  • Massachusetts: The state GIS was a massive import, by account jremillard-massgis (noderank #10) and a few others. Amazingly, the road data is actually of high quality and needs very little cleanup; the wetland hydrography is a bit messier.

  • Some highly ranked accounts appear to be national imports (?) that I found harder to learn about, such as Tom_G3X (noderank #7, 19 Mnodes in Japan) and Petr1868 (noderank #9, who has apparently added 23 Mnodes to the Czech Republic using “Tracer Using RUIAN and LPIS”)

  • France has many accounts importing from its national cadastre database, but it is very hard to tell which. One might guess that ËdzëronK (noderank #12) and the 15 other massive contributors to France in the top 100 are importing cadastre, but perhaps some of them are actually just amazing, really active mappers.

In my next post I’ll talk about some non-import, real cool mappers I discovered.

Comment from imagico on 29 May 2015 at 12:12

Nice summary of the big imports.

You however only looked at the formal cleanup of the nodes. There is also substantial cleanup in tagging required, in Tiger that is obvious but with Canvec this is also a real problem. To give an example: Canvec imports tag all waterways as stream. There are nearly 3 million waterway=stream ways with a source=NRCan* tag right now but only 774 (!) waterway=river with such source tag, 63 of which have been touched since the beginning of the year, 285 last year.

Making a very generous estimate and assuming the same number of rivers have been retagged removing the source tag (which is more difficult to determine, but it is unlikely there are this many - source tags usually remain untouched in later edits) and assuming only ten percent of the waterways would actually qualify for waterway=river (in reality it is more) it would still take ~500 years to fully evaluate waterway tagging in Canvec data even if we’d stop importing further streams right away.

By the way, CanvecImports only accounts for ~15% of the ways with a source=NRCan* tag, extrapolating from that for the nodes there would be more than 250 million Canvec nodes at the moment.

Comment from Jiří Komárek on 30 May 2015 at 18:32

As of Petr1868 (and others in the Czech republic): all his RUIAN and LPIS changes are semi-automated imports. You need to click on every single field/meadow/building/… to draw it. There is still a lot of clicking.

