Postigs question - find out the unnecessary points that exist on the map
Posted by baditaflorin on 16 September 2015 in English.My hypothesis is that we could reduce the planet file with 10-100 MB by only removing the unnecessary points that exist on the map.
I am trying to figure it out the correct postgis query to find out exactly this.
I am trying to compare the points of a linesting, and if the degrees between 3 adjacent points is 0 degrees, that means that the point in the middle can be deleted.
The only check in place that i see is to check that the point in the middle is not connected to a point and check that the hstore tag of that node is empty, meaning that there is no value added to the node ( pedestrian crossing, motorway_jucntion, exit_to, etc )
Discussion
Comment from gileri on 16 September 2015 at 08:53
Good idea, sounds doable !
However watch out, some people hate automated changes, and would revert your edit, even if done perfectly. So in order to avoid having your work thrown out by such persons, maybe you could add do a map to display such nodes and allow their manual deletion, even if this means slowing the correction process.
Comment from baditaflorin on 16 September 2015 at 09:21
I don`t want to do it, especially in a automated way and without consulting the community. I had not find yet a way to import the whole planet, only part of the planet
But i want to have the proof of concept, to show.
Example 53405432 nodes could be removed, saving us XX MB , reducing the import time of the planet file by XXX seconds …
Comment from SK53 on 16 September 2015 at 09:48
I did actually do a calculation on this for part of the USA (TIGER data suffers particularly from over-noding).
To do so you need a snapshot database. Only nodes without tags and only appearing once in the way_nodes table need to be checked. I just compared length of the shortest line from the node to a straight line between the adjacent nodes. I can’t find the code I ran, but certainly there’s great scope for reducing the size of data in the US.
Comment from Sanderd17 on 16 September 2015 at 10:07
This shouldn’t be done inside the OSM database, but osm editors should make sure this doesn’t happen (by showing a warning when there’s such a node f.e.).
Also, wrt your example, it depends on how you measure angles. To take an extreme example, if you have a straight line from London to Beijing, the shortest distance is through Finland, and not through central Europe. So if you have a centre point somewhere in Ukraine, it might be a zero-angle point on a Mercator map, but it won’t be a zero-angle point when using exact shortest-distance lines. This effect also always play up on shorter lines, so every point in the database will be a non-zero angle according to some projection and a certain precision.
Btw, many data consumers already use the Douglas-Peucker algorithm to remove any points up to a certain precision when processing it. Like OsmAnd compiles their obf files while applying Douglas-Peucker in order to shrink the files.
Comment from baditaflorin on 16 September 2015 at 10:24
Thanks, i did not know about the Douglas-Peucker algorithm, i will check it out and see if maybe this can be used for this simple task.
I had used this in the past http://pastebin.com/2kH0mAWG to calculate in Qgis the angles that are more then 50 degrees ,when i wanted to create a map roulette challenge for Romania and
Now, it`s kind of the same idea, but with a different threshold
romanian discussion about the topic
Comment from Vincent de Phily on 16 September 2015 at 10:37
In case you hadn’t found it already: http://postgis.net/docs/ST_Simplify.html uses douglas-peucker and you should be able to do the conversion using a simple sql update. Remember to vacuum full before and after if you want to know exactly what space savings this gets you.
Comment from Vincent de Phily on 16 September 2015 at 10:39
Hum, thinking more about it, ST_Simplify probably doesn’t pay attention to connected and tagged node, so it’s not that straightforward to use.
Comment from SimonPoole on 16 September 2015 at 10:40
Besides that reducing the size of the planet file by such a small amount woudn’t justify anything, removing the nodes creates a new version of the way adding to general database bloat.
Further none of the editors nor the API support downloading elements that have no nodes in the requested bounding box. As a result it makes sense to have not all too far apart nodes on even completly straight ways..
It should be noted that we have massively (as in multple nodes per meter) overnoded ways from imports that naturally can be simplified when detected.
Comment from baditaflorin on 16 September 2015 at 11:56
@SimonPoole that is a valid point of view.
Anyhow, i think i should change the initial hypothesis, to the first thing that i am interested, that is, to be able to detect and count this. then, publish the metrics. Then we will know some numbers.
I am trying to develop different metrics that could help us find errors on the map. One of the aims is to be able to detect the overnoded ways from import.
Comment from baditaflorin on 16 September 2015 at 13:30
I will leave the code here and try to work on it after i will find some postgis gurus that will be able to help me :P
https://gist.github.com/anonymous/3669b78e0898bde4f638
Comment from butrus_butrus on 16 September 2015 at 18:30
Hi!
I’m not sure this is such a good idea. I leave sometimes such points as a preparation for future mapping.
Maybe you can add a condition that the point to be deleted is older than (at least) two weeks?
Comment from baditaflorin on 16 September 2015 at 19:28
Butrus sure this can be filtered from the query.
Anyhow, this is a hobby project, i don`t have yet the postgis skills to do it, so no worry
Also, i just want to see the result, i will not act upon it.
When i will be able, i will just highlight the extreme cases, and maybe do a maproulette challenge when a road have more then 10 nodes that can be deleted, so that users can check them
Comment from pnorman on 16 September 2015 at 19:57
Doing a way simplification with a threshold of 0 is unlikely yield any practical speed improvements.
It takes about one to two days to import the planet with osm2pgsql, assuming reasonable hardware. The node parsing stage takes about 15 minutes. It’s unlikely to speed up multipolygon-related computations, the slowest part of the import. It won’t speed up clustering or rendering table index creation.
My guess is that there would be no detectable speed increase.
Comment from AndersAndersson on 17 September 2015 at 10:09
I don’t like this idea I’m afraid.
Sometimes you leave a node for a not yet mapped crossing road.
You also reduce the trustworthiness of the data. On a straight way without nodes, you don’t know what the real road does between the mapped nodes. But if you have a node in the middle, you know that the road is probably straight, and that is not just a lack of “resolution”.
Data storage and computer speed will increase parallell to a growing database. So I can’t see the problem.
Comment from karussell on 19 September 2015 at 20:28
For routing we do the same in GraphHopper import as we only need the junctions and end points. But I doubt this makes or will reduce much of the data as the ways are not always this straight in real world. And even if you want to reduce this which difference via douglas-peucker is acceptable 1m, 0.1m or 0.0000m? Reducing the data if it is not 0.00000m difference will reduce the quality in certain cases.