Tagged and Untagged Nodes

Posted by Zverik on 18 October 2016 in English (English).

I’ve just counted some statistics on a planet file from 14th of October. Here it is:

A table with node statistics

This table shows a number of nodes, both tagged and untagged, that are referenced by ways and relations. You can see that nearly 97% of 3.5 billion nodes are untagged, and most of these — 88% — are part of exactly one way or relation. Like, when you trace a building, you add four untagged nodes that are part of that closed way.

98.4% of all nodes are part of something, but only 12% (424 million) have two or more parent objects. This could help with designing a data storage for nodes.

There are equal amount of tagged nodes that are not part of anything, and part of an element. Interesting are these 9 million tagged nodes that are part of two or more ways. The taginfo says there are 2.5 million crossings and 860 thousand traffic signals, so that’s a ⅓ of that.

Finally, we have a million of nodes with no tags not being a part of anything. I wonder when someone puts on their OSM saviour cape and a programmer’s hat and rids us of these.

Comment from ff5722 on 18 October 2016 at 22:20

I haven’t bothered to learn overpass syntax yet, but I found these two scripts;

  1. find all nodes without tags:
  2. find all nodes not part of a way

Combining these should give all nodes without tags and not part of a way…

Comment from SimonPoole on 18 October 2016 at 22:39

The redaction process created a large number of orphan untagged nodes, typical example of that happening would be when a road was redacted away, but the nodes not (because they where created / moved by somebody that accepted the CTs). As a result the nodes may still have residual geometry information (by how they are arranged) and should only be removed when that aspect has been checked.

The other source of such nodes are naturally (broken) imports, unluckily there is no penalty for not cleaning up after you have messed up.

Comment from ImreSamu on 18 October 2016 at 22:57

The Taginfo version :

few days later : ( 2016-10-18 00:58 UTC )

  • Number of nodes in the database: 3 572 929 193
  • Number of nodes with at least one tag: 112 854 493
  • Percentage of nodes with at least one tag: 3.15%
  • Number of tags on nodes: 363 886 126
  • Average number of tags per tagged node: 3.22

Comment from Zverik on 19 October 2016 at 08:23

Thanks Imre, I didn’t know Taginfo had that statistics. I did this because of the number of references though.

Simon, thanks for reminding of the redaction, I forgot how many orphaned nodes it left. Of course my last remark about removing these is sarcasm: I certainly do not want for anybody to do mass-deletions.

ff5722, nice scripts, thanks for sharing!

Comment from SK53 on 20 October 2016 at 12:47

Only a million lonely nodes seems quite small by older standards. When Cadastre first came out I was cleaning up a hundred thousand or so at a time. Matt (zere) used to have a duplicated node map too which was a big problem particularly with TIGER, NHD & landuse imports in the US (more or less until ogr2osm fixed most of those isseus).

Comment from ianlopez1115 on 21 October 2016 at 08:00

@ff5722, I did a bit of research and some tweaking based on previous examples, and here’s what I was able to come up: an overpass query looking for nodes without tags not belonging to ways or areas here.

Login to leave a comment