OpenStreetMap logo OpenStreetMap

Discussion

Comment from chris_debian on 5 April 2026 at 09:07

Nice script, Marcos! Would you be able to share an example of a ‘before’ and ‘after’ tag?

Thanks,

Chris

Comment from Marcos Dione on 5 April 2026 at 13:53

The edits are by hand, there is no automation there, just finding “bad” values.

Comment from chris_debian on 5 April 2026 at 14:23

Thanks for the clarification Marcos, makes sense to keep the edits manual for accuracy.

I had a go at tidying up the script a little, in case it’s useful. The main changes:

  • Swapped os.system() for subprocess.run() — a bit safer and more Pythonic
  • Added an input() pause between objects so they open one at a time rather than all at once
  • Used a context manager for the DB connection
  • Added a HAVING count(*) < N filter to focus on the long tail and skip common valid values
  • Added a KNOWN_GOOD set to skip values you already know are fine
  • Added comments explaining the negative-ID-means-relation behaviour (I didn’t know that, good to learn!)

Revised script below. Sorry about the formatting, the diary interprets hash/ pound as bold font, they should be comments in the script. Happy to be ignored if you prefer your original — it clearly does the job! :)

Chris


Revised script

```python #! /usr/bin/env python3 “”” fix_osm_tags.py - Find and manually correct wrong highway tag values in OSM.

Requires a local osm2pgsql rendering database (e.g. ‘europe’). Workflow: 1. Queries planet_osm_line for highway values, rarest first (long tail first). 2. For each rare value, opens the OSM editor in your browser one object at a time. 3. You review, correct or leave a note, then press Enter to continue. “””

import subprocess import sys

import psycopg2

— Configuration —

DB_NAME = “europe” BROWSER = “librewolf” BROWSER_PROFILE = “default”

Highway values that are known-good and should be skipped.

# Expand this list to avoid being prompted for valid rare values. KNOWN_GOOD = { “residential”, “track”, “path”, “footway”, “cycleway”, “service”, “unclassified”, “tertiary”, “secondary”, “primary”, “trunk”, “motorway”, “living_street”, “pedestrian”, “steps”, “motorway_link”, “trunk_link”, “primary_link”, “secondary_link”, “tertiary_link”, }

Only show groups with fewer than this many occurrences.

# Keeps the focus on the long tail of rare/likely-wrong values. MAX_COUNT = 50

def open_in_editor(osm_id: int) -> None: “"”Open the OSM web editor for a given osm2pgsql osm_id.

In osm2pgsql rendering databases, negative IDs represent relations;
positive IDs represent ways.
"""
if osm_id < 0:
    # Negative ID => relation
    url = f"https://www.openstreetmap.org/edit?relation={-osm_id}"
else:
    # Positive ID => way
    url = f"https://www.openstreetmap.org/edit?way={osm_id}"

subprocess.run([BROWSER, "-P", BROWSER_PROFILE, url], check=False)

def main() -> None: with psycopg2.connect(dbname=DB_NAME) as db: cursor = db.cursor()

    # Fetch highway values ordered by frequency ascending (rarest first).
    # HAVING filters out common values that are unlikely to be errors,
    # keeping the focus on the suspicious long tail.
    cursor.execute(
        """
        SELECT
            count(*) AS count,
            highway
        FROM planet_osm_line
        WHERE
            highway IS NOT NULL
        GROUP BY highway
        HAVING count(*) < %s
        ORDER BY count ASC
        """,
        (MAX_COUNT,),
    )
    groups = cursor.fetchall()

    for count, highway in groups:
        # Skip values we already know are valid.
        if highway in KNOWN_GOOD:
            continue

        print(f"\n{'='*50}")
        print(f"Value: '{highway}'  ({count} occurrence{'s' if count != 1 else ''})")
        print("Press Enter to open each object, or type 's' to skip this group.")

        choice = input("> ").strip().lower()
        if choice == "s":
            continue

        # Fetch all OSM IDs with this highway value.
        cursor.execute(
            """
            SELECT osm_id
            FROM planet_osm_line
            WHERE highway = %s
            """,
            (highway,),
        )
        osm_ids = [row[0] for row in cursor.fetchall()]

        for i, osm_id in enumerate(osm_ids, start=1):
            print(f"  Opening {i}/{len(osm_ids)} (osm_id={osm_id}) ...")
            open_in_editor(osm_id)

            # Note: if the object no longer exists in OSM, the editor
            # will open a view of the whole planet — just close that tab.
            if i < len(osm_ids):
                next_action = input("  Press Enter for next, or 's' to skip rest of group: ").strip().lower()
                if next_action == "s":
                    break

    print("\nAll done!")

if name == “main”: sys.exit(main()) ```

Comment from Marcos Dione on 7 April 2026 at 08:28

I did more modifications on my side, I collapsed both queries into a single one, and now there’s no wait between objects :) Also, any key for any table. Will try to merge these tonight.

``` #! /usr/bin/env python3

import os import sys

import psycopg2

def main(): db = psycopg2.connect(dbname=’europe’) cursor = db.cursor()

cursor.execute(f'''
    SELECT
        count(*) AS count,
        {sys.argv[2]},
        array_agg(osm_id)
    FROM planet_osm_{sys.argv[1]}
    WHERE
        {sys.argv[2]} IS NOT NULL
    GROUP BY {sys.argv[2]}
    ORDER BY count ASC
''')

while (data := cursor.fetchone()) is not None:
    # print(data)
    count, tag, osm_ids = data
    print(f"next {count}: {tag}")

    for osm_id in osm_ids:
        ans = input(f'Ready for {osm_id}? ')
        if ans != 'n':
            if osm_id < 0:
                # in rendering DBs, this is a relation
                os.system(f"librewolf -P default 'https://www.openstreetmap.org/edit?relation={-osm_id}'")
            else:
                os.system(f"librewolf -P default 'https://www.openstreetmap.org/edit?way={osm_id}'")

if name == ‘main’: main() ```

Comment from chris_debian on 7 April 2026 at 16:24

Nice work.

Leave a comment

Log in to leave a comment