mtmail's Diary

Recent diary entries

Cleaning up US postcodes

Posted by mtmail on 26 May 2019 in English (English).

Or zip codes as they’re called in the United States.

The Nominatim geocoder has trouble parsing address queries when its OSM database table contains invalid postcodes. For example when a building in OSM has addr:postcode=TX set then Nominatim will make an entry “TX is a postalcode in the United States”. When a user searches for an address containing “place1, TX, USA” Nominatim will search for “place1, USA near postcode TX” which might return a place by the same name, but outside Texas.

The technical solution is to disregard any data which doesn’t confirm to the country’s postcode standard, here 5 digits with optional 4 digits (“12345” or “12345-6789”). Not import into the database, skip in an intermediary step (calculating postcode center point or boundaries) or skip during query time. Nominatim issue 2017

In the meantime why not correct OSM data, could be fun.

First I went for obvious garbage: ‘$x1’, ‘null’, ‘0’, non-printable characters and such. Not a lot.

Then any postcodes not starting with a number. That turned out hundreds of city names (with or without postcode), street names or states codes. Splitting those values in iD editor was easy. For those formatted “TX 12345” I used the level0 editor state-by-state, it’s faster but still very manual.

Lastly it will be postcodes which are 3-4 or 6 digits. That will take the longest and often require local knowledge. Those “123XX” I think have little value, no search engine can make effective use of partial postcodes. At least in the US postcodes are not hierarchical enough.

I found some foreign postcodes, usually UK and Canada (well 5 digits French postcodes looks exactly like US postcodes so I wouldn’t be able to identify them). There must have been a bug with the openwheelmap app least year. For example a hospital was mapped in the middle of a rural road in Ohio, postcode, town, phone number all pointing to England. 10 of those from different users, all using that app. For most I found the place already mapped in the other country, so just had to merge some tags and delete as duplicate. For extra head scratching I found some in the ocean.

Some foreign postcodes near the country border are fine. E.g. a border patrol house was crossing the border. Who knows what’s the story behind this petrol station

In my opinion POBOX data doesn’t belong in OSM. There was less than 10 places with PO BOX numbers in the addr:postcode values. Some came from an import, others were ecommerce stores with no physical presence. The location was either a postoffice (makes sense) or nearby (doesn’t make sense). If the website didn’t list a street address I decided to delete the place. Users can’t visit the store in the real world and an entry inside a postoffice doesn’t reflect reality.

Of course there were many small corrections besides postcodes I found. Closed shops, misplaced shops (left OSM notes), SEO and marketing descriptions

In total I might be able to edit 2000-3000 places, I hope those edited with level0 editor and spanning a larger area (still less than 50 places each) don’t look to scary or mechanical/scripted to other mappers. Thousands others will be left open and I might write a validator plugin ( ?). Even if all match the standard a future project could check for postcodes too far out of place, at least on the Nominatim issue list users reported bugs caused by those postcodes.