OpenStreetMap

Duplicate nodes

Posted by EdLoach on 29 November 2009 in English.

I've spent the last couple of weeks, more-or-less, trying to clear all the duplicate nodes in the UK. It's taken that long, and about 800 changesets, partly because I lose track of which I've done and the excellent tool here:
http://matt.sandbox.cloudmade.com/dupe_nodes.html
shows changes made before midnight GMT after about 7pm GMT, so it takes some time to see what remains to be done. I used what I think is a variation of that tool here: http://matt.sandbox.cloudmade.com/foo.html?lat=53.14677&lng=-5.053711&zoom=5&layer=5 as I was able to tackle a square at a time to try and keep track of what I'd done a little more easily.

As I type, the second link above is showing one green square for the UK which includes 8 new duplicates introduced yesterday that I fixed last night. There also seem to be a load of red dots which I think may be some sort of cache issue, as the lack of coloured squares suggests no errors (and there shouldn't be really, as those red dots are the duplicates I dealt with on Friday evening).

What causes the duplicates, though? Well, there seem to be various reasons. Some areas have had issues for a while, and also include lots of unjoined ways. I suspect these are usually down to new users, probably using older versions of Potlatch when it wasn't clear that the way you were drawing was going to join the other way. In other cases there were duplicate ways on top of each other. I'm hoping these were all pre-API 0.6 upgrade and were attempts by users to reupload changes after timeouts, causing the way to be uploaded multiple times. In some small number of cases there were roundabouts which were very circular but had an excessive number of nodes and the way went around the roundabout a number of times; I suspect some sort of neating of circles tool with a bug. To my surprise though, the duplicates that I've seen new in the last two or three days have been added by long standing mappers using JOSM (tested), so wonder if there is some bug with joining ways, or whether it is just harder to tell with certain preference combinations than it is in Potlatch these days. I'll keep my eye on other new duplicates and see if there is any pattern.

During one evening when I was waiting for the refresh to appear, I did briefly look at the USA. Some people will know about the TIGER import of US data means everything at the county boundaries involves (or involved at the time) disconnected ways and duplicate nodes, but sorting them out is a nightmare. Where county boundaries run along a way, the way is duplicated, but not usually the same way. One county might have it running along their boundary and then left at a junction, whereas the other may have it running along the shared boundary and then right at that same junction (perhaps where three counties meet). Splitting the ways and removing the duplicates, then rejoining all the side ways to the way that remains is very time consuming and requires much more care than the UK tidying up I've been doing. Respect is due to the mappers who continue with this task.

Discussion

Comment from stevage on 30 November 2009 at 07:27

Several times I've caused duplicate ways/nodes. The reason is Potlatch occasionally failing to retrieve data from the server. Unfortunately, there is no visual distinction between "this area is unmapped" and "this area is mapped, but the data hasn't been downloaded yet". So you end up mapping the area, then some time later (maybe after browser refresh), the real data comes through...and now you have two sets.

(Though come to think of it, you're talking about real duplicates, not near-duplicates. Hmm.)

Comment from EdLoach on 30 November 2009 at 08:22

Hi stevage, I can see how that would create near duplicates, but these are indeed exact duplicates on top of each other, usually with duplicated nodes on top of each other, but sometimes sharing the same nodes. I guess the difference there is whether the problem was when the way was being created or amended. If it helps, you could perhaps try having the Mapnik background visible in Potlatch which might give an indication as to whether data has downloaded OK or not.

Comment from EdLoach on 30 November 2009 at 08:22

I meant to add, yesterday's new duplicate node was down to a Merkaartor user.

Comment from EdLoach on 1 December 2009 at 09:15

5 new nodes to fix this time. Matt had already fixed 2 (which were caused by Potlatch being used to revert corruption of a way, so may have been duplicated before the corruption), and the other three were down to JOSM tested, by experienced mappers. I am beginning to wonder if JOSM tested has a bug or not.

Log in to leave a comment