After I came across some business descriptions in OSM that were of dubious quality, I decided to hunt them down systematically. OSM is, after all, not a place for advertisements. Now, about half a year and hundreds of POI tag fixes later, it is time to reflect on this project and to share my observations.
People who map their business on OSM usually have a single changeset in which they put their business on their map. They often foul up the opening hours sysntax and international formatting for phone numbers, and there is usually a lot of info still missing. This is fine, since OSM data in general follows the trend where basic map data receives details, corrections and improvements by different mappers over time.
An issue arises when companies try to sneak in their brochure texts and other SEO spam. We want OSM to stay objective and neutral and we want data that relates to the real world, so this information is unwelcome. We can’t stop people from mapping their company details, but moderation is clearly needed if we are to uphold these principles.
I started looking for a way to detect the unwanted spam. The result is this Overpass query for buzzwords in the
description tag. Think about words like “award winning”, “reliable service” and “conveniently located”. This is a dynamic process, because I regularly add new buzzwords that I encounter alongside the ones that I find through the query and I remove words that result in false positives.
Image 1: Distribution of the results in Overpass Turbo
Many businesses are properly tagged apart from the questionable description, and for those it’s a quick and easy process to delete the description and move on. Example
The more complicated cases in be categorised as follows:
Places that are missing a main feature tag
Some businesses are not tagged as anything, but instead they just have a name, address, description and with some luck a website. A close inspection is needed to figure out how to tag these businesses. Example
Descriptions with additional information to tag
Some descriptions contain both the unwanted spam and some useful information. It might be a hotel that offers free wi-fi, a pet-friendly café or an insurance company that mentions its phone number in the
description tag. With enough understanding of OSM’s tagging practices it’s possible to turn this SEO spam into proper tags like
Sometimes a person manages to fit too much SEO spam into an OSM object. There might be emojis, inviting messages in bold text, names fully capitalised, 15+ payment options,
cuisine tags packed with all the drinks that are offered, an image that’s just the logo, website tags that lead to a review page and address tags that link to Google Maps, all on top of the usual shenanigans. There is no way to speedrun a cleanup of these objects; they need to be inspected one tag at a time. I have seen too many of these and am now contemplating a position as monk at the nearest monastery. Example
Sometimes buzzwords like “famous for” and “the best” are not intended to allure potential customers, but are somehow part of neutral descriptions of places. I saved the IDs of the false positives I found to exclude them from the query. Example
To edit the tags of these objects I mainly use Overpass Turbo in conjunction with the OSM tags editor, which is an extension for Google Chrome that lets you edit tags with a minimalistic UI directly on osm.org. My main considerations are speed and simplicity, but for more versatility, like the ability to remove duplicate POIs or to have a validator tool, it pays off to choose JOSM or iD/Rapid instead.
I have created MapRoulette challenges to ask for help with reviewing and removing business descriptions. So far, some helpful mappers have removed roughly 700 unwanted descriptions globally. These challenges only feature nodes for now. I just uploaded a new version of the challenge here.
OpenStreetMap is becoming an increasingly interesting medium for firms to make their presence known to the world. We generally welcome their contributions to the map, but since these people usually don’t return to OSM after their initial effort to map their businesses, we need to have a good look at their work to assure that it meets community standards. I am taking a deep dive into the descriptions they add, and after I worked my way through hundreds of them I can conclude that there is a lot of room for improvement, either through removing spam or through converting it to other useful tags. As with everything else in OSM it is an effort to which anyone can contribute.
Congratulations for making it to the end of my essay. Thank you for reading this.
P.s. I created a forum thread in which we can discuss this topic.