I've spent the last couple of weeks, more-or-less, trying to clear all the duplicate nodes in the UK. It's taken that long, and about 800 changesets, partly because I lose track of which I've done and the excellent tool here:
shows changes made before midnight GMT after about 7pm GMT, so it takes some time to see what remains to be done. I used what I think is a variation of that tool here: http://matt.sandbox.cloudmade.com/foo.html?lat=53.14677&lng=-5.053711&zoom=5&layer=5 as I was able to tackle a square at a time to try and keep track of what I'd done a little more easily.
As I type, the second link above is showing one green square for the UK which includes 8 new duplicates introduced yesterday that I fixed last night. There also seem to be a load of red dots which I think may be some sort of cache issue, as the lack of coloured squares suggests no errors (and there shouldn't be really, as those red dots are the duplicates I dealt with on Friday evening).
What causes the duplicates, though? Well, there seem to be various reasons. Some areas have had issues for a while, and also include lots of unjoined ways. I suspect these are usually down to new users, probably using older versions of Potlatch when it wasn't clear that the way you were drawing was going to join the other way. In other cases there were duplicate ways on top of each other. I'm hoping these were all pre-API 0.6 upgrade and were attempts by users to reupload changes after timeouts, causing the way to be uploaded multiple times. In some small number of cases there were roundabouts which were very circular but had an excessive number of nodes and the way went around the roundabout a number of times; I suspect some sort of neating of circles tool with a bug. To my surprise though, the duplicates that I've seen new in the last two or three days have been added by long standing mappers using JOSM (tested), so wonder if there is some bug with joining ways, or whether it is just harder to tell with certain preference combinations than it is in Potlatch these days. I'll keep my eye on other new duplicates and see if there is any pattern.
During one evening when I was waiting for the refresh to appear, I did briefly look at the USA. Some people will know about the TIGER import of US data means everything at the county boundaries involves (or involved at the time) disconnected ways and duplicate nodes, but sorting them out is a nightmare. Where county boundaries run along a way, the way is duplicated, but not usually the same way. One county might have it running along their boundary and then left at a junction, whereas the other may have it running along the shared boundary and then right at that same junction (perhaps where three counties meet). Splitting the ways and removing the duplicates, then rejoining all the side ways to the way that remains is very time consuming and requires much more care than the UK tidying up I've been doing. Respect is due to the mappers who continue with this task.