OpenStreetMap logo OpenStreetMap

Fixing Chinese Place Name Display Issues in OpenStreetMap with a C++ Batch Script

Posted by lpf452 on 14 September 2025 in English. Last updated on 1 October 2025.

As an OpenStreetMap contributor, I’ve always been dedicated to enhancing the detail and usability of local data. Recently, however, I ran into a frustrating problem: in areas I’ve mapped, the Chinese names for many places fail to display correctly in certain applications and services (like OsmAPP, JawgMaps, and MapTiler). Instead, they either fall back to Pinyin or default to the English name (name:en), which looks odd—especially when a primary name tag clearly exists but is simply ignored.

The root of the problem lies in the peculiar rendering rules of these applications, which often prioritize name:[lang] tags that match the user’s language. Even though we add a name tag, the absence of an explicit name:zh or name:zh-Hans tag can leave the renderer confused, causing it to fall back to name:en or just display the Pinyin transliteration.

Manually adding these tags to thousands of elements is obviously out of the question. You can’t just copy and paste your way through it; the sheer monotony would be mind-numbing. So, I decided to automate the process by writing a script.

Tech Stack and Script Logic

When high performance is a priority, C++ is the natural choice. I also leveraged two powerful open-source libraries:

  1. pugixml: A lightweight, high-performance C++ XML parser, perfect for rapidly reading and writing large .osm files.
  2. OpenCC: The community’s go-to library for Simplified and Traditional Chinese conversion, which I used to generate name:zh-Hant tags.

The core logic of my script is as follows:

  1. Load and Parse: Use pugixml to load the local .osm data file queried from the Overpass API.
  2. Iterate Through Elements: Loop through every node, way, and relation in the file.
  3. Identify Targets: Check if an element has a tag with k="name".
  4. Generate New Tags: If a name tag is found, perform the following actions:
    • Create a new <tag k="name:zh" .../> by copying the value from the name tag.
    • Do the same to create a <tag k="name:zh-Hans" .../>.
    • Use the OpenCC library (with the s2twp.json configuration) to convert the name value from Simplified Chinese to the Traditional Chinese variant used in Taiwan, creating a <tag k="name:zh-Hant" .../>.
  5. Create a Change File: Write all modified elements—preserving their original version numbers—to a new .osc (osmChange) file, ready for upload.

I’ll drop in an AI-generated code snippet here (I was too lazy to write it out myself) for your reference:

#include <iostream>
#include <fstream>
#include <string>
#include <cstring>
#include <memory>
#include <vector>
#include "pugixml.hpp"
#include <opencc/SimpleConverter.hpp>
#include <opencc/Exception.hpp>

int main(int argc, char* argv[]) {
    
    if (argc != 3) {
        std::cerr << "Usage: " << argv[0] << " <input.osm> <output.osc>" << std::endl;
        return 1;
    }

    const char* input_file = argv[1];
    const char* output_file = argv[2];

    
    opencc::SimpleConverter converter("s2twp.json");
    
    
    pugi::xml_document doc_in;
    pugi::xml_parse_result result = doc_in.load_file(input_file);

    if (!result) {
        std::cerr << "Error parsing input file: " << result.description() << " at offset " << result.offset << std::endl;
        return 1;
    }

    std::cout << "Successfully parsed input file: " << input_file << std::endl;

    pugi::xml_node osm_node = doc_in.child("osm");
    if (!osm_node) {
        std::cerr << "Error: <osm> root tag not found." << std::endl;
        return 1;
    }

    
    std::vector<pugi::xml_node> modified_elements;

    
    for (pugi::xml_node element : osm_node.children()) {
        if (strcmp(element.name(), "node") != 0 &&
            strcmp(element.name(), "way") != 0 &&
            strcmp(element.name(), "relation") != 0) {
            continue; 
        }

        bool has_name_tag = false;
        std::string name_value;

        
        for (const auto& tag : element.children("tag")) {
            if (strcmp(tag.attribute("k").as_string(), "name") == 0) {
                has_name_tag = true;
                name_value = tag.attribute("v").as_string();
                break; 
            }
        }
        
        
        if (has_name_tag) {
            
            pugi::xml_node tag_zh = element.append_child("tag");
            tag_zh.append_attribute("k") = "name:zh";
            tag_zh.append_attribute("v") = name_value.c_str();

            pugi::xml_node tag_zh_hans = element.append_child("tag");
            tag_zh_hans.append_attribute("k") = "name:zh-Hans";
            tag_zh_hans.append_attribute("v") = name_value.c_str();
            
            
            try {
                std::string name_hant = converter.Convert(name_value);
                pugi::xml_node tag_zh_hant = element.append_child("tag");
                tag_zh_hant.append_attribute("k") = "name:zh-Hant";
                tag_zh_hant.append_attribute("v") = name_hant.c_str();
            } catch (const opencc::Exception& e) {
                std::cerr << "Warning: OpenCC conversion failed for value '" << name_value << "'. Error: " << e.what() << std::endl;
                
            }

            modified_elements.push_back(element);
        }
    }
    
    std::cout << "Found and processed " << modified_elements.size() << " elements with 'name' tag." << std::endl;

    
    if (!modified_elements.empty()) {
        pugi::xml_document doc_out;
        auto declarationNode = doc_out.append_child(pugi::node_declaration);
        declarationNode.append_attribute("version") = "1.0";
        declarationNode.append_attribute("encoding") = "UTF-8";

        pugi::xml_node osm_change_node = doc_out.append_child("osmChange");
        osm_change_node.append_attribute("version") = "0.6";
        osm_change_node.append_attribute("generator") = "osm_name_tool_cpp";

        pugi::xml_node modify_node = osm_change_node.append_child("modify");

        for (const auto& el : modified_elements) {
            modify_node.append_copy(el);
        }

        if (doc_out.save_file(output_file, "  ")) {
            std::cout << "Successfully generated osmChange file: " << output_file << std::endl;
        } else {
            std::cerr << "Error writing to output file: " << output_file << std::endl;
            return 1;
        }
    } else {
        std::cout << "No elements with 'name' tag found. Output file was not created." << std::endl;
    }

    return 0;
}

Compiling

(on Linux):

g++ -std=c++17 -O2 -o osm_name_tool {replace_filename}.cpp -lopencc -lpugixml

Usage

(on Linux):

$ chmod +x ./osm_name_tool
$ ./osm_name_tool <input_osm_filename> <output_osc_filename>

Results

The script ran efficiently, finishing its job in under a second. For the Dazhu County area (the .osm file from Overpass was over 30MB), it successfully added the missing Chinese language tags to 1,790 elements (comprising 599 nodes, 1,110 ways, and 81 relations).

I’ve uploaded the resulting .osc file using Vespucci. You can find the changes in this changeset.

Screenshot_2025_0914_135956.png

Comparison

Here is a before-and-after comparison of the script’s edits for your reference (shown in OsmAPP, a general-purpose OpenStreetMap application):

Before After
Screenshot_2025_1001_172636.png Screenshot_2025_1001_172604.png

Potential Issues and Risks

  • Non-Chinese Names: The script is still quite basic. It doesn’t check if the value in the name tag is actually in Chinese. For example, it would incorrectly process a tag like name="KFC". While this is rare in the areas I map, I can’t rule out the possibility of such errors.
  • Conversion Errors: While OpenCC is a powerful tool, it’s not infallible. There’s a chance it could incorrectly convert certain specific place names.

Request for Review and Feedback

My goal with this edit was to enhance the value of OSM data across a wider range of applications and improve the map experience for end-users.

I sincerely welcome and encourage fellow mappers in the community to review this changeset. If you spot any incorrect modifications, data errors, or have suggestions for a better approach, please don’t hesitate to leave a comment on the changeset or send me a message on OSM.

I am committed to actively addressing any issues raised, whether that means making corrections or performing a full rollback with osm-revert. Thank you.

Email icon Bluesky Icon Facebook Icon LinkedIn Icon Mastodon Icon Telegram Icon X Icon

Discussion

Comment from SimonPoole on 14 September 2025 at 17:51

You should follow osm.wiki/Automated_Edits_code_of_conduct if you intend to run a script on any larger number of objects.

Comment from lpf452 on 15 September 2025 at 16:11

@SimonPoole This is my first time using a script for this kind of name refinement. I see on the OSM Wiki page you referenced that this falls under “other scripted changes made to the database,” but I’m not sure how to proceed. Could you give me some advice?

Log in to leave a comment