OpenStreetMap

Objective

Run the find_duplicate_nodes.sql script from Github repository to find all the possible duplicated nodes that exist in Romania. ## Hardware The hardware used is a Intel i7-4770 CPU @ 3.40GHz, 16 GB RAM ## Numbers The pbf file for Romania is 144 MB. Romania database have a total 18M nodes. The query took 20 minutes to run, and found 34674 duplicated nodes. # 34674 duplicated nodes loaded into JOSM ## 1 The way 336945939 have 1500 nodes, and does not contain now any relevant information. ## 2 Buildings not sharing same connecting nodes ## 3 All the city had been imported 2 times This means that every building have a duplicate, one of them should be deleted. ## 4 Duplicated Highway path Way 370332689 and way 363957545 are the same. ## 5 Bad Import Separate ways for each objects, when they should be connected and share the same way. ## 6 2 relations on 2 different ways Way 164811803 and way 164682871 should be combined where they shere the same path, and one of them deleted, and mode the relation into the remaining way ## 7 Bad Import - Building and fence not sharing same way. That is one of my imports. Ups. ## 8 - To much details ## 9 - Reduntant Nodes The line is strait, so the information is redundant.

Osm Diary Entry about Osm PostGIS Script Repo

## Github code ## YouTube Tutorial

Discussion

Comment from PlaneMad on 9 March 2016 at 17:02

Wonderful analysis! Running this for the whole world will be quite revealing about the state of data on the map.

Comment from baditaflorin on 9 March 2016 at 22:11

Yeah, this would require a powerful server.

For Germany i think it execute the script in 6-8 hours.

Europe maybe 2-3 days

The World it will be done in 5-6 days

Comment from ViriatoLusitano on 10 March 2016 at 17:29

Nice job!

My JOSM always runs out of memory for me even if I dedicate it 8192 GB if I want to validade a whole country with similar size as yours. :(

Comment from baditaflorin on 10 March 2016 at 17:35

@ViriatoLusitano did you try with the script ?

I also did a scope.bat file for windows users, check it out. https://github.com/baditaflorin/osm-postgis-scripts

In PostGIS it should not be a problem

Comment from Jochen Topf on 18 March 2016 at 09:18

Nice analysis of the different cases of duplicate nodes. But quite expensive to do in the database. I have had a program lying around for a long time that does this analysis directly from the OSM files without any database. It runs in seconds for small extracts and about 12 minutes (on my server) for the planet. I just released it here.

Comment from baditaflorin on 30 August 2016 at 13:12

@jochen nice tool, so low level. I will try to test it on a virtual machine, i also like the other tools from your repo

Log in to leave a comment