4 826 424 "addr:country=DE"

Posted by Pieren on 20 June 2014 in English (English)

It seems the Germans like duplicate tags since the country address is easy to deduce from the country polygon. It could be justified on edge cases like "near the borders" or outside the polygon but not generalized, I don't know how this community has reached such senseless decision- Probably from some data consumers too lazy to use some complex queries in a spatial db.... Sad that the majority of contributors have now to assume this bad idea. I like how this community is sometimes pushing for more standardisation/consistency like for postcode polygons or speed limits but here I cannot follow. By chance, in my country, we have a general consensus to avoid unnecessary duplicates.

Comment from JohnDoe23 on 20 June 2014 at 12:26

There is no consensus on this topic in Germany.

btw there's a validator tool for Berlin which checks if every address has the addr:country and addr:city Tag lol: example:

Comment from Pieren on 20 June 2014 at 12:39

Well, if nobody stops the process, then it's a kind of passive consensus. And this is not a small one, it's about 30..50 million tags, I don't know exactly.

Comment from JohnDoe23 on 20 June 2014 at 12:51

Yes, you're right, but how to stop this process? There are strong supporters of the idea to add useless redundant informations like addr:country, addr:postcode, ... on every node although we have good borders in Germany.

Maybe it would help to remove the input field from the JOSM address presets? In iD addr:country isn't displayed.

At the moment there are 6.036.685 addresses in Germany (we expect 21.000.000 addresses in Germany).

Comment from brogo on 20 June 2014 at 13:25

You know the KISS-prinicple?

With adding ALL important address-information to the address, you are able to get the full information without having a spatial-database with at least the whole country, to get closed boundaries.

In another blog-comment I asked for a query to get the whole address from a reduced address-info in OSM (addr:country, addr:city, addr:postcode can the obtained from the boundaries, addr:street must be extracted from the associatedStree-relation and addr:housenumber ist the only tagged on the object). No one gave me that query.

When talking to (potential) data-users, there is nearly always a point, that it is too complex too extract information from OSM-data.

Not having a (spatial) database to work with OSM-data has nothing to do with laziness, it is often not pracitable.

Comment from Andy Street on 20 June 2014 at 13:33

It seems the Germans like duplicate tags since the country address is easy to deduce from the country polygon

I suppose this depends on whether you consider the "addr:*" tags to represent the postal address (what you'd need to write on an envelope) or an addresses for navigation (what you'd put into a sat-nav). If it is the former, consider the scenario where a business physically located in the UK uses a mail processing company in France to open, scan and email their correspondence (possibly to make things easier for their French customers). In this instance it is absolutely wrong to use the value of the polygon because you'll end up with a value of "Paris, GB".

Of course you could make the argument that "addr:country" only needs to be set on these edge cases but then you run into the problem all "default" tags suffer from which is that you can't tell the difference between someone who omitted the tag because they are implying something and those that omit the tag because they didn't consider it while mapping.

Comment from flohoff on 20 June 2014 at 14:13

I have added 30000+ Adresses to OSM in the past 7 Years, and i am a heavy consumer of that Data too. There is no such thing as "duplicate tags". Self contained addresses are a wonderful thing. The associatedStreet relation has proven that its too complex for people to handle, and to complex for people to consume. There are tons of tools out there which refuse to work with non self contained addresses where part of the tags are missing, or where the exact address resolution need complex spatial request to a full planet database.

All these in your eyes "duplicate tags" can be compressed away with very little cpu power so i dont see any point in making it complex for people and consumers for saving a few bits.

Comment from flohoff on 20 June 2014 at 14:24


The String addr:countryDE 32768 times in a file is 950272 bytes. Compressed with bzip its 280bytes - Thats a compression ratio of 3393:1.

Using your number of 4826424 tags at 29 bytes with the above compression ratio results in 41251 bytes.

So ·~40KByte of the planet are for addr:country=de - I dont think its worth the discussion or the hassle for Data consumers.

Comment from Pieren on 20 June 2014 at 14:39

The KISS principle shall apply to the contributors and for that, "addr:housenumber" and "addr:street" are enough. Btw, I'm also against the relation "associatedStreet" more or less for the same reasons (keep it easy for the contributors and on all editors).

Comment from Pieren on 20 June 2014 at 14:42

About the size on disk, I agree that this is not the issue. In France, we had similar discussions about the "cadastre" source tag. But here, we must think first for the contributors. If some newcomer arrives to an already mapped city with 5 or 6 "addr" tags, he will be relunctant to do the same task for the new or missing or failing addresses where he could help and where only 2 are really necessary.

Comment from Vincent de Phily on 23 June 2014 at 22:27

I'm in the "no redundancy" camp, mainly because it is less time-consuming to create and maintain (from fixing a typo on a street name to the anexation of Crimea). The fact that the data is smaller and tidyer is a bonus. For the specific case of the associatedstreet relation, it also fixes the problem of finding the right street to park in front of a house when the nearest-matching-way algorythm fails.

Normalizing the data does make querying more complicated, but the fact is that relations offer advantages and/or make things possible that wouldn't be otherwise. They are used more and more. So if your tool doesn't handle relations... It doesn't handle OSM data very well at all. Go fix it.

If you "don't want a geodatabase" because querying street and country at runtime is too costly, add those tags to each object at import time.

There's a balance to be found between coders/computer ressources and data contributors. On that point, while it is probable that geocoding will eventually be computationally "solved and finished", adding and curating data will never be. So I lean on the data contributor's side in this equation.

Of course there are compromises to be made. "Don't tag to the renderer/geocoder" only goes so far, pragmatism is also needed. But in the addr:* case, I feel that this pragmatism is a bad idea.

Comment from Vincent de Phily on 23 June 2014 at 23:06

Let's run some actual data compression numbers :

  • download extract from geofabrik and decompress
  • use grep to filter out the addr:country tag into a new file
  • recompress both files

$ \ls -Sl brandenburg-latest.* -rw-r--r-- 1 work work 2019804582 Jun 24 00:34 brandenburg-latest.osm -rw-r--r-- 1 work work 2007110472 Jun 24 00:38 brandenburg-latest.nocountry.osm -rw-r--r-- 1 work work 159076860 Jun 24 00:34 brandenburg-latest.osm.bz2 -rw-r--r-- 1 work work 158989054 Jun 24 00:38 brandenburg-latest.nocountry.osm.bz2 -rw-r--r-- 1 work work 119848272 Jun 24 00:34 brandenburg-latest.osm.xz -rw-r--r-- 1 work work 119720968 Jun 24 00:38 brandenburg-latest.nocountry.osm.xz

The gain is 0.6% for plaintext, 0.05% for bzip2, 0.1% for xz... All in all not very impressive, but not as insignificant as the 40K that flohoff expected. Compressing a string disseminated in a file is not as easy a compressing a file made entirely of that string.

Also, remember that downloading and decompressing data might not be your bottleneck. Parsing the decompressed data has its cost.

Since the numbers are small (I could increase them by using an adress-focused extract, but why bother ?), I agree that size is not very important in this case. But it's still good to have proper numbers :)

Comment from AndiG88 on 26 June 2014 at 05:59

For the specific case of the associatedstreet relation, it also fixes the problem of finding the right street to park in front of a house when the nearest-matching-way algorythm fails.

But nearest-matching-way isn't the equivalent to the realtion. It's putting the add:street tag on the object.

Also the main problem as pointed out is not just consumers, but users. addr: tags on a object are easy to understand for a new mapper, relations are not.

Comment from Skippern on 29 June 2014 at 18:10

I have tried to add addr:country, addr:state, addr:city and addr:suburb to the border relations, that way even admin_level=10 boundaries which hugs quite closely on the objects have the necessary information. That way only addr:housenumber/addr:housename, addr:street and addr:postcode need to be tagged on each object. In Brazil addr:postcode can be different from different sides of the street, some blocks might have its own postcode, even some buildings have independent postcodes, so deriving this from relations might be complicated. Also sometimes if the local postcode is unknown, a generic addr:postcode for the city can be added to admin_level=8 relation (which should be overdid by the addr:postcode on each object where those are known). in many cases associatedStreet relations will be too complicated (should we have one for lefthand side and another for righthand side?). It is much easier for entering and maintenance to have this information on each individual object, and should not impede data consuming in any significant way.

As far as I know, the entire municipality of Divino de São Lourenço have 1 postcode, and the rural districts of Guarapari share a generic postcode while the urban areas mostly have postcode for each street (with some exceptions).

Comment from Vincent de Phily on 29 June 2014 at 18:35


I'm not sure I follow the point in your 1st paragraph. My point is that when you want to route to a particular housenumber, the routing algorythm will find the closest matching way and direct you there. Most of the time, doing that matching using just the street name, even if there are multiple corresponding ways, is correct. But sometime (typically in housing estates) the name-matched osm way that is closest to the house is the wrong one. If matching using a relation instead, there's only one osm way to choose from and no mistake made.

Regarding the ease of contribution (for newbies and veterans), there is no obvious winner. Creating the relation is harder for newbies, same amount of work for veterans. Maintaining it is easyer for both.

It's also worth it to add a house to the associatedstreet relation even before you know its housenumber. Firstly because it makes surveying housenumbers easyer, secondly because it is usefull information even in the absence of housenumbers.

Login to leave a comment