Recent diary entries
Recently I have been experimenting with extracting UK addresses from OSM data. Where a postcode has not been tagged, I have assigned the nearest one using the OS Opendata Codepoint centroids. The nearest postcode will quite often not be the correct one, but this is sufficiently accurate for my intended use of identifying the correct city and suburb.
Having done this, I thought it would be interesting to look at which postcode areas contain the most mapped addresses, so I made a list, which I have put on the following page in case anybody else is interested: http://wiki.openstreetmap.org/wiki/User:Will_P/UK_Addresses
The Birmingham (B) postcode contains the most addresses with 80,996 tagged, followed by Colchester (CO) with 52,257 and Nottingham (NG) with 41,313. At the other end, Llandrindod Wells (LD) and, somewhat surprisingly, Sunderland (SR) only contain 9 tagged addresses.
I have also made lists for the postcode districts and sectors within postcode areas with the most mapped addresses. It should be noted that the number of addresses within these areas varies greatly, so those areas at the top of the lists are not necessarily more complete than those further down.
Having added a lot of addresses myself, its interesting to identify other places where extensive address mapping has taken place, and to look at the different styles of mapping and tagging used.
Here are the postcode districts and sectors with the most mapped addresses -
WA7 5 - Runcorn (5384 addresses)
NG9 2 - Beeston (part) and Lenton Abbey, Nottingham (5100 addresses)
BA13 3 - Westbury, Wiltshire (4987 addresses)
CV3 5 - Coventry (Cheylesmore) (4919 addresses)
NG9 3 - Bramcote and Trowell, Nottingham (4866 addresses)
B27 7 - Acocks Green, Birmingham (4857 addresses)
B11 3 - Sparkhill, Birmingham (4549 addresses)
B76 1 - Walmley, Sutton Coldfield (4441 addresses)
CO12 4 - Harwich, Essex (4315 addresses)
CB1 3 - Cambridge (south east) (4184 addresses)
CO15 4 - Clacton-on-Sea (4078 addresses)
For the last few months I’ve been focussing on surveying addresses in the NG9 (UK) postcode area. This is now largely complete: there are currently about 29950 NG9 addresses in the database (or 32800 after interpreting addr:interpolation, addr:flats and so on). There are still some areas that need work, these are difficult to survey places, such as gated streets. The largest current omission is around 100 residential addresses ‘behind the wire’ within an army base.
I have found tagging addresses quite a frustrating process and I suspect I would have ‘completed’ the NG9 addresses earlier if it had been more straightforward. Adding addresses is a time-consuming process and I’ve been keen that the data I am inputting is in a format that is actually useful. I have regularly thought about whether the tagging I have used can actually be used to reconstruct the full address, and indeed I have written a script to do this, in order to help me identify the pitfalls. For a typical numbered street things are relatively simple, but for anything more complicated it quickly becomes confusing and the wiki provides little guidance.
Below are my thoughts on some of the tagging issues I’ve encountered -
‘Sub-streets’ / dependent thoroughfares
The issue I have found most problematic is how to tag ‘sub-streets’. These are typically short streets or terraces that ‘belong’ to a ‘parent’ street but are numbered separately. These are not uncommon around where I live.
For example: 5 Warren Arms Place, Albert Avenue, Stapleford, Nottingham Unit 2, Westpoint Shopping Centre, Ranson Road, Chilwell, Nottingham
There is no established way of tagging these at the moment. Some users have suggested using the addr:full tag, but I don’t think this creates useful data, and I’d prefer not to bother if that’s really the only option.
There is the possibility of putting both parts in the addr:street tag separated by a comma: addr:street=”Warren Arms Place, Albert Avenue”. I initially used this method, for want of anything better, but it feels like a hack.
More recently I have started putting both parts in separate street relations and have then added the child relation to the parent with the role ‘subsidiary’. This is non-standard, but to me seems a satisfactory solution.
Tagging localities and suburbs
Around where I live addresses officially often contain a suburb/town/village/locality name in addition to the city name. This is useful for distinguishing nearby streets with the same name. In the NG9 area several street names occur twice, and a handful three times, so this is useful extra information for telling them apart. Interfaces for inputting addresses in Potlatch and JOSM don’t provide a field for this information, so the vast majority of users don’t record it at all. Until recently the wiki didn’t identify any tag except addr:city for this purpose.
Some people argue this information should be derived from administrative and other boundary relations instead. However, unless special relations are created specifically for postal boundaries, I don’t see how this could work, because there is a often a mis-match between administrative boundaries and the localities used in addresses. For example, the NG10 postcode area is entirely in Derbyshire, but the city for postal purposes is Nottingham.
I decided to use the addr:suburb tag to record this information, which has now been documented on the wiki. This is slightly problematic, because if I tag a village, should I still use addr:suburb or something else? Also, near me all addresses in the main University of Nottingham campus contain ‘University Park’, which functions very like a ‘suburb’ name. It’s hard to decide which tag is most appropriate for this (and another local user has been considering the same question). It would be good to write clearer guidelines, so the tagging can be more consistent between different areas.
When to interpolate house numbers
It is quite common in the UK for a single address to contain a range of numbers where units have been combined together. For example, the street address of a nearby supermarket is ‘41-57 Derby Road’. There is a potential problem here, because the wiki says that numeric ranges should implicitly be treated as containing addr:interpolation=”all”. So how do you tag it if you just want it left as it is?
My view is that numeric ranges should not be interpolated unless an addr:interpolation tag has actually been added, so I have not attempted to tag these examples in a special way, despite the potential ambiguity. I suppose an alternative would be to use addr:interpolation=none or hacks such as using the addr:housename tag instead.
I have grouped most streets together using ‘street’ relations, so that it is easier to manage address tags that apply to a whole street. I notice the similar ‘associatedStreet’ relation is actually more popular. It’s a shame useful tools such as the Postcode Finder only support the latter relation type. My decision to use ‘street’ relations was simply that it seemed a better thought out proposal when I started adding addresses (2-3 years ago).
The ‘associatedStreet’ relations use the role ‘house’ for addresses, which I’ve always disliked. It leads to misunderstandings were people add roles like ‘shop’, because it’s not clear ‘house’ means any address. Also, the ‘associatedStreet’ relation used to have the totally crazy restriction that it could only contain one way with the role ‘street’. Thankfully this has long since been changed, and was obviously ignored anyway, but avoiding having JOSM’s overzealous validator repeatedly informing me that my relations had too many street roles seemed enough reason in itself to use the alternative ‘street’ relation.
Neither ‘associatedStreet’ nor ‘street’ relations (as documented on the wiki) acknowledge that the address role can contain a relation. Why not? For example, there is a school nearby that is divided into two by a main road and it is logical to put the address in a relation combining the two parts. I obviously don’t take any notice of this restriction, but JOSM’s validator inevitably again pops-up to tell me I’m doing it ‘wrong’, which could be confusing to newer users.