I woke up one morning, and realized I needed a reusable dataset of all the communities in the world. Not just X-Y, but administrative areas. Obviously, I started looking on OSM. With a bit of playing around (and a little help from my friends), I had a nice set of admin areas of various levels from OSM in a shapefile. Then I started noticing holes. If a country is mostly made of holes, you know there is no data. But if there are a few holes, well, something is fishy. What happens is, there are no extraction tool that can make an admin area if there are gaps. A line is not an area, only a closed line is. Borders in OSM are relations. These are collections of lines, joined together in virtual union. Often, someone deletes one of these lines, and replaces it with a more detailed version. That's sort of OK, but then you have to add the new line to -all- the relations the old one was part of. This is of course very exotic to new mappers, and even experienced mappers don't always seem to care.
Data use = data cleaning
Data will only get fixed, if they are used. And even on the forum, I saw people referring end data users to other sources to get their admin areas. It is complicated to extract a dataset with borders from OSM. So why care about this data? It shows up okay on OSM even if it's broken. BUT, user Wambacher made this great tool to download shapefiles by country with all available admin areas. So now that it's easy to use the data, please help maintain it.
Fixing things up
I tried several tools to help fixing things. I have a global focus, so I've mostly been doing fixups on the admin level right below the country. Often states, sometimes departments, etc. If you're going to fix a certain area, you're going to need other tools (see below) - choose a level to fix using layers.openstreetmap.fr - check which are missing - find the relation which is broken (or rarely, missing). Search doesn't always work. Zoom to a frontier which at one side is ok, at he other isn't. Click on the frontier and find the relations it is part of. Copy the ID. - Vizualize the relation with an url like this: http://www.openstreetmap.org/relation/3624100 This shows obvious defects. If defects are more subtle (little holes, almost-junctions), go to http://analyser.openstreetmap.fr/cgi-bin/index.py and paste the ID.
Causes of trouble
During the fixups I found many different types of errors. Borders are basically always the result of imports. Duh. Messing around with borders are a good way to understand why imports are controversial. It's easy to do it wrong, and hard to do it right. Sometimes there is data from the original shapefiles that were used, like area=123. In some cases, the original polygons are still there. And where it gets really messy, is when you add detailed borders from source X on top of general borders from source Y. It takes a lot of time and effort to clean that up, and importers don't always get around to finish it all up. Once the data are there, information may be redacted, because the original data wasn't compatible with our licence. Most often: no commercial reuse, the menace of all open data users. And redaction leaves a mess. Another source of trouble is including both seafront and a maritime border. These should be separated. Apart from that, most errors come from beginners or experienced users alike who delete a line and replace it with something else. Simple solution: use shift-click to improve geometry instead. That makes understanding the history of an area much easier too.
In depth error checking
If you want to go in depth in a certain area, these tools will come in handy: http://keepright.at/report_map.php > Click "none" (bottom left), then only activate the "Boundaries" checks
This even vizualizes all the broken relations, but it's just plain depressing to use: http://tools.geofabrik.de/osmi/?view=multipolygon
Using Overpass turbo you can quickly get the ID's for the admin areas in the area you're fixing. That makes it a lot easier to start fixing things. For example, using this query you get a map with all the relations defining level 4 admin areas. Don't zoom out when running, as you will be downloading too much data.