OpenStreetMap

Finding SEO spam in OSM

Posted by Friendly_Ghost on 18 June 2023 in English. Last updated on 23 June 2023.

After I came across some business descriptions in OSM that were of dubious quality, I decided to hunt them down systematically. OSM is, after all, not a place for advertisements. Now, about half a year and hundreds of POI tag fixes later, it is time to reflect on this project and to share my observations.

Introduction

People who map their business on OSM usually have a single changeset in which they put their business on their map. They often foul up the opening hours sysntax and international formatting for phone numbers, and there is usually a lot of info still missing. This is fine, since OSM data in general follows the trend where basic map data receives details, corrections and improvements by different mappers over time.

An issue arises when companies try to sneak in their brochure texts and other SEO spam. We want OSM to stay objective and neutral and we want data that relates to the real world, so this information is unwelcome. We can’t stop people from mapping their company details, but moderation is clearly needed if we are to uphold these principles.

I started looking for a way to detect the unwanted spam. The result is this Overpass query for buzzwords in the description tag. Think about words like “award winning”, “reliable service” and “conveniently located”. This is a dynamic process, because I regularly add new buzzwords that I encounter alongside the ones that I find through the query and I remove words that result in false positives.

Image 1: Distribution of the results in Overpass Turbo Image 1: Distribution of the results in Overpass Turbo

Results

Many businesses are properly tagged apart from the questionable description, and for those it’s a quick and easy process to delete the description and move on. Example

The more complicated cases in be categorised as follows:

Places that are missing a main feature tag

Some businesses are not tagged as anything, but instead they just have a name, address, description and with some luck a website. A close inspection is needed to figure out how to tag these businesses. Example

Descriptions with additional information to tag

Some descriptions contain both the unwanted spam and some useful information. It might be a hotel that offers free wi-fi, a pet-friendly café or an insurance company that mentions its phone number in the description tag. With enough understanding of OSM’s tagging practices it’s possible to turn this SEO spam into proper tags like internet_access, dog=yes or (contact:)phone=*. Example

Chaos

Sometimes a person manages to fit too much SEO spam into an OSM object. There might be emojis, inviting messages in bold text, names fully capitalised, 15+ payment options, cuisine tags packed with all the drinks that are offered, an image that’s just the logo, website tags that lead to a review page and address tags that link to Google Maps, all on top of the usual shenanigans. There is no way to speedrun a cleanup of these objects; they need to be inspected one tag at a time. I have seen too many of these and am now contemplating a position as monk at the nearest monastery. Example

False positives

Sometimes buzzwords like “famous for” and “the best” are not intended to allure potential customers, but are somehow part of neutral descriptions of places. I saved the IDs of the false positives I found to exclude them from the query. Example

Editing

To edit the tags of these objects I mainly use Overpass Turbo in conjunction with the OSM tags editor, which is an extension for Google Chrome that lets you edit tags with a minimalistic UI directly on osm.org. My main considerations are speed and simplicity, but for more versatility, like the ability to remove duplicate POIs or to have a validator tool, it pays off to choose JOSM or iD/Rapid instead.

MapRoulette

I have created MapRoulette challenges to ask for help with reviewing and removing business descriptions. So far, some helpful mappers have removed roughly 700 unwanted descriptions globally. These challenges only feature nodes for now. I just uploaded a new version of the challenge here.

Conclusion

OpenStreetMap is becoming an increasingly interesting medium for firms to make their presence known to the world. We generally welcome their contributions to the map, but since these people usually don’t return to OSM after their initial effort to map their businesses, we need to have a good look at their work to assure that it meets community standards. I am taking a deep dive into the descriptions they add, and after I worked my way through hundreds of them I can conclude that there is a lot of room for improvement, either through removing spam or through converting it to other useful tags. As with everything else in OSM it is an effort to which anyone can contribute.

Congratulations for making it to the end of my essay. Thank you for reading this.

P.s. I created a forum thread in which we can discuss this topic.

Discussion

Comment from Endres Pelka on 18 June 2023 at 21:11

Businesses that stuff their advertising rubbish-text anywhere they can, even on OpenStreetMap, are not trustworthy in any aspect. Their object on OSM might be misplaced many kilometers, or the business might already be bankrupt or moved somewhere else, long before we notice the dubious edit.

I’d just delete such objects right away (preserving the address or building outline, if it looks plausible). If the business cares, they would respond then. If not, why bother?

Comment from ivanbranco on 18 June 2023 at 22:49

Cool query! Some of it could be a cool Osmose check imho

Comment from b-unicycling on 18 June 2023 at 23:28

Thanks for taking the time to put so much effort into it.

Comment from adreamy on 19 June 2023 at 03:56

What a wonderful article.
Please introduce to Weekly OSM.

Comment from 快乐的老鼠宝宝 on 19 June 2023 at 08:48

I usually remove the promotional information from this element because it is shameful to advertise your business through a community that is editable by everyone, it should only retain the most basic and neutral informations like name=* and phone=* (or etc.)

Comment from Glassman on 19 June 2023 at 15:33

One easy method of finding SEO Spam is to review new users contributions. Since the main culprits seem to limit their changesets to one or two, they show up as a new users. Around me, they show up with a changeset comment of updated and have a username much like the business being added. I would encourage everyone to start welcoming new users in their area using the Welcome tool.

I would also like to thank user_53959 for their world wide work of cleaning up SEO Spam.

When the changeset has many errors, I usually just revert the edit. Otherwise I try to fix minor issues.

Over the years I’ve tried contacting businesses to find out who is adding their business to OSM. So far no luck.

Comment from Kai Johnson on 21 June 2023 at 16:57

Nice work! I don’t see much of this spam where I’m mapping, but I’ll keep an eye out for it!

Comment from Friendly_Ghost on 21 June 2023 at 22:05

Thanks for all the nice comments and for your perspectives on the subject.

Comment from Msiipola on 22 June 2023 at 06:24

I tried to open the Overpass query, but got an error. Maybe you could copy the query text here? I assume it must be adjusted for a specific countrys language.

One problem with these POI’s is how to verify if the business is still there? If can’t verify it yourself by visiting the address, you can try to google. But if google doesn’t return anything useful, should you delete the POI or not?

Comment from Friendly_Ghost on 22 June 2023 at 15:24

I have no idea what happened to that link, but here is my latest version: https://overpass-turbo.eu/s/1woJ

Comment from Friendly_Ghost on 22 June 2023 at 15:25

Also, you’re not allowed to use Google as source for OSM mapping because of copyright.

Comment from Msiipola on 22 June 2023 at 17:17

About Google, you can see if the businesses is till going by looking at the businesses pages. Are they updated or are they several years old? Is the adress given there same as in OSM etc.

Comment from hfs on 23 June 2023 at 12:26

Great initiative!

I think you can remove amenity=telephone as false positives. Someone copy and pasted the same description to 1500 (!) public telephones in Germany which contains “comfort” and matches the query.

Comment from Friendly_Ghost on 23 June 2023 at 12:38

Thank you! I updated the query to exclude these results.

Comment from Nearby0051 on 25 June 2023 at 15:22

Interesting Would it also be possible to output and then check POIs that were created by a user with the same name as the POI created by them?

Seems like many businesses sign up using the same name as the POI they end up creating, at least here where I live

Log in to leave a comment