OpenStreetMap logo OpenStreetMap

Minh Nguyen's Diary Comments

Diary Comments added by Minh Nguyen

Post When Comment
'Tower of Hanoi' technique for mapping buildings

The second screenshot shows the result of moving the building areas to align them with the base of the building rather than its roofline. For tall buildings, the subject of this diary post, it will appear misaligned unless you pay close attention to the shadows. This is relevant when using an imagery layer that isn’t perfectly orthorectified – which is essentially every available layer.

Sometimes if I know that most of the buildings are approximately the same height, I’ll temporarily offset the imagery layer so I don’t have to move the buildings after drawing them. But in this example, the buildings appear to have varying heights and even uneven roofs. 😣

Dismistifying Wikidata and standards compliant semantic approach on tags on OpenStreetMap to make tooling smarter on medium to long term

I don’t want to distract from the main topic, but since you seemed to be troubled by Mapbox’s flag in this sidenote:

While I do have prior advanced experience in other areas, as you can see from my account, I’m so new to the project that as a newbie user of iD left after the tutorial in India I got scared that if someone touches something, after that validators will assume that person is responsible for errors in that something. In my case it was “Mapbox: Fictional mapping” from OSMCha.

So assume that this text is written by someone who one day ignored iD warnings for something I touched, still not sure how to fix the changeset 127073124 😐

Sorry you found this intimidating. OSMCha has an API that allows individual features in a changeset to be flagged as “suspicious” for a particular reason. Not every flagged feature is rejected; OSMCha itself sometimes applies suspicious reasons like “New mapper” and “Possible import” that only serve as a heads-up to someone doing a manual review at Mapbox or elsewhere.

Mapbox’s data team flagged this way as appearing to be fictional, as it looks like someone doodling a road through a populated place without any resemblence to aerial imagery. (You can find the specific feature by clicking the ⚠️ tab.) Perhaps you had meant to draw something else but accidentally tagged it as a road? You’re welcome to use these flags to detect and fix errors too. In any event, Mapbox accepted the rest of your changeset; for example, you can already see this road in Mapbox maps. If you don’t have a Mapbox account, you can check using this example page or a map by one of Mapbox’s customers.

OSMCha doesn’t track how many flagged features you’ve accrued, so even a false positive shouldn’t be an ongoing problem for you. OSMCha does track how many changesets its users rate as good or bad. Review teams at Mapbox or elsewhere could theoretically consider this statistic when judging whether to scrutinize a changeset more closely.

Hope this addresses your concern. (For full disclosure, I work at Mapbox but not on the teams involved with this software or process.)

Dismistifying Wikidata and standards compliant semantic approach on tags on OpenStreetMap to make tooling smarter on medium to long term

Thanks for taking the time to explain more about the purpose and reasoning behind Wikidata and Wikibase. It’s entirely possible that some of the misunderstanding and reticence that persists today can be traced to some early missed opportunities to explain these unfamiliar concepts patiently and effectively. At least that was my takeaway when compiling a bibliography of OSM discussions about Wikidata.

As many of us discovered at this weekend’s joint conference, “WikiConference North America + Mapping USA”, an increasing number of people in both the OSM and Wikimedia communities are interested in exploring what our projects can accomplish together. Personally, I think the best way to overcome the skepticism about Wikidata is to demonstrate the value of the currently limited integration in the form of creative visualizations, analyses, and tools.

Adding buildings with RapiD - what to do with existing address nodes?

In general, having the address on the house is a great thing IMO (good to review, map, query, audit). However, that only works for rural areas (AKA not cities), and mostly for residential buildings (AKA “one family houses”, not shopping centers, commercial buildings, apartments).

I think a more general way of putting it is that sometimes an address applies to the entire building (every room on every floor) or even the entire property encompassing the building.1 In this case, tagging the building itself with address tags explicitly tells other mappers that the building is complete in OSM and there are no more addresses to add. It also makes it easier for data consumers to determine the extent of an address (similar to how a boundary provides information that a lone place node does not).

The prevalence of this case depends on the local addressing practices. In parts of the U.S. that I’m familiar with, it’s quite common for a retail or commercial building in an urban or suburban area to have just one street address (often but not always with individually numbered units within). Some postal codes in New York City consist of just one tall building with one street address. On the other hand, if a single building occupies an entire city block, the whole building may have as many as four overlapping street addresses, one for each bounding street. The addresses may be used interchangeably, or each address may be for a specific purpose (e.g., mail versus wayfinding versus taxation).

One way to account for both of these cases while preserving the benefits above is to draw an address area coincident with the building. This already happens sometimes due to 3D and indoor mapping. However, some data consumers (such as openstreetmap-carto) are currently unable to handle address areas unless tagged with another primary feature tag. So tagging the building itself would be a more compatible approach.

Either an addressed building or a coincident area would make it easier for geocoders to associate entrances with an address for routing purposes. By comparison, if a geocoder encounters an address node floating within a building, it can’t be sure that all the building’s entrances can reach the unit with that address, versus another address that may or may not have been mapped yet.

  1. Authorities in the U.S. typically assign an address to an area, such as a plot or building, but a delivery point can refer to the mailbox or entrance in particular. 

New York minor civil subdivisions - status and progress

population=* on aplace node is asserting “there is an enumeration region of this name containing this point which has the given population.” […] It’s not ideal, it’s a starting point.

Last year, there was a proposal to use the Census Bureau’s urbanized areas as the basis for population tags on place nodes. Urbanized areas ignore jurisdictional boundaries in favor of population distribution, which theoretically would line up better with that the place nodes represent, but the messy reality is that populated places are also a function of commerce and industry, which the urbanized area definitions don’t consider, and sometimes downright arbitrariness.

Ultimately, there’s no purely data-driven method for correctly sizing every place label on a map without some degree of human judgment. As you say, the population tags are just a starting point. It may be good enough for the “long tail” of places that a data consumer wouldn’t know how to classify manually.

It’s an easy enough operation to push operator and operator:wikidata down the admin_centre link. One use case I had in mind for going the other way is, “I just moved to town; where do I go for voter registration, dog licensing, property tax information, etc?” In all cases in New York, the town or city clerk’s office is a starting point. The clerk (an elected position) is the official custodian of records (and the boss of pretty much all the local bureaucrats). I don’t know how easy it would be get that from the OSM data if we were to turn the relationship upside down the way you suggest.

To me, this use case doesn’t sound fundamentally different than searching for your state legislator’s constituent service office, police precinct, school board office, or power utility office. In general, we aren’t mapping service areas as boundaries, but some government offices happen to have service areas that conform to an administrative boundary. Even so, it’s up to the user to do their homework about which local office can help them.

In some states, things get too complicated to express in tags. For example, San José’s water utility – a bona fide part of city government – serves only 12% of the city, not including where I live. For most purposes, the county sheriff’s office serves unincorporated areas but not cities and towns. In a neighboring county, the county’s public health department doesn’t serve one city that has their own public health department. There’s a contract city nearby that contracts with other governments to provide basic services and generally doesn’t provide services “in house”.

As long as there’s a distinct item for the government as opposed to the place, then both the boundary and office could be tagged with the same operator:wikidata, making that a little easier. But I don’t think there’s very much a data consumer should infer based on that relationship.

New York minor civil subdivisions - status and progress

The thought you’re putting into this boundary mapping and cleanup effort is setting a great example for us to follow in other states that have their own vagaries.

This was a somewhat arbitrary cutoff. I wanted it to include Saranac Lake (pop. 4887) because that community has the only hospital for many miles around, and has an airport with scheduled, albeit infrequent, service. The threshold could be set higher if the manual work of identifying the sites of such facilities as hospitals, universities, airports, major markets, and so on were to be attempted, but I’d consider that to be Out of Scope.

You’ve just justified a one-off exception for Saranac Lake, which would allow you to set a rounder overall threshold that doesn’t sound so arbitrary. Some mappers may be inclined to second-guess or ignore arbitrary-sounding rules.

Somewhat controversially, I’ve left boundaries of most CDP’s as boundary=administrative. I know for certain that the ones in Nassau County, at the very least, actually are administrative subdivisions without home rule - the towns of Hempstead, North Hempstead, and Oyster Bay all designate hamlets, and often promulgate things like parking regulations and zoning ordinances by calling out the hamlets by name rather than repeating the boundaries in each piece of legislation. I figured that in doubtful cases, it’s better to show the boundaries than to hide them.

It sounds like these particular imported CDPs are coincidentally coincident to real places that should have been mapped as administrative areas but, like minor civil divisions, were omitted from the TIGER boundary import. You may want to add border_type=* so that someone doesn’t come along, see “CDP” inside tiger:NAMELSAD, and think it only represents a CDP and therefore should be retagged as boundary=census.

This is a total abuse of the tag - it’s supposed to identify the capitAl, not the capitOl. Nevertheless, it provides useful information, and I believe that instead of deleting the relation members wholesale, it would probably be better to rename the role.

I’ve been using operator and operator:wikidata to associate a government’s headquarters with the boundary relation representing the government’s jurisdictional area. The operator:wikidata tag of the amenity=townhall or office=government would match the wikidata tag of the boundary.1 I find this approach to be more flexible in cases where a government’s offices aren’t centralized in a single building, typical of county governments in some states. It’s also consistent with tagging for company headquarters, park offices, university administrative offices, etc.

If the office must be a member of the boundary relation, then a seat role would be an improvement. But this comes uncomfortably close to site relation semantics, for something that isn’t as compact as a site.

  1. More precisely, there would be separate Wikidata items for the place versus its government, linked by the authority and applies to jurisdiction properties. Data consumers would need to consult the Wikidata API or a database extract or query the Wikidata Query Service to determine the relationship between the office and the boundary. But so far I’ve yet to come across a compelling articulation of why a data consumer would need to automatically associate these things anyways. 

Adding buildings with RapiD - what to do with existing address nodes?

Cool challenge! In both iD and RapiD, you can select the building area and node and use the Combine operation. (In RapiD, you have to accept the candidate building area first.) As long as the area is being added in your current changeset, the node becomes the area’s northwesternmost vertex, but the tags get transferred from the node to the area. This preserves the node’s history while keeping you from having to manually transfer any tags.

If you ever need to transfer tags without combining features, there’s a button above the table of raw tags that switches to a key=value textbox, similar to the Level0 syntax. You can copy-paste tags freely between features using this syntax. Alternatively, you can use the Copy operation to duplicate the original feature, then Combine it with the target feature.

Highway shields, state by state

If you’re looking for inspiration, here are some existing SVG route shield sets that you could adapt in Inkscape or Adobe Illustrator

By way of an update, the OpenStreetMap Americana project has developed a much better collection of SVG images you can use as shield backgrounds. These images are in the public domain, so you can use them freely. Another resource is Rebusurance, which is designed for user interfaces rather than maps but may be suitable if you need something at a larger size.

What does "privacy" mean for OpenStreetMap?

The Streisand effect is essentially what the original post is about. But I was referring to a question posed in OSMUS Slack about the longtime residence of a politician. The residence’s location had been well-known to residents of the city for many years, but now the politician is important enough that there are security considerations. The question was whether to map the house as anything special or even have it on the map at all. The on-the-ground rule rules out special tagging for the house, and mapping all the houses in the neighborhood skirts the question of whether mapping this particular house will cause any problems.

How I classify urban roads

Thank you for thoroughly documenting your thought process for classifying roads in some of the largest metropolitan areas in the country. As a community, we need more writeups along these lines in order to help us come to a shared understanding about how to classify, not just what classifications have been applied.

Traffic control devices are a crucial tool for understanding the intended accessibility and mobility of a given road. I’ve especially found two-way stops to be a bright line between secondary and tertiary, whereas other criteria tend to get bogged down in exceptions.

If two ways, often gets enough traffic to warrant a centerline dividing traffic directions, resulting in one marked lane per direction.

I agree with distinguishing tertiary from residential on the basis of a centerline stripe in urban areas of California. However, this criterion is specific to urban road design standards in California. In other metropolitan areas, inner-city and inner suburban residential roads may well have centerline stripes.

Traffic counts: In general, I think traffic counts are useful: more important roads probably have more traffic, because they serve more important destinations. In practice, I think it’s hard to assign absolute traffic counts to specific classifications, as the context varies so much from city to city and even region to region within a city. I think the best use of them is for additional information when deciding between otherwise similar, parallel routes: if one has significantly more traffic than others, than it may be better to classify it as the higher class, and others as the lower class. At least in my area, this data is difficult to access on a broad scale, and since not all the data is taken using the same method at the same time, it can be hard to interpret too.

While I agree that traffic counts shouldn’t influence road classifications as a matter of first impression, I have found them useful for objectively breaking ties and for mitigating classifications that don’t pass the sniff test, subjectively speaking.

Caltrans collects traffic volume data statewide across California and publishes it as Excel spreadsheets and PDFs and as a FeatureServer. Despite ostensibly being licensed under the Creative Commons Attribute license, the dataset is actually in the public domain as a work of a California government agency, and the traffic counts themselves aren’t being copied into OSM anyways.

Other tags, such as bus route relations, cycleway tagging, and sidewalk mapping, sufficiently depict the importance of a road in these other modes, and they therefore need not be considered when classifying roads in urban areas.

The needs of various modes of transportation are often at odds with each other. Genuinely accounting for public transportation, cycling, and pedestrian network connectivity would effectively average out and flatten the classification system. For example, the arterial roads across Tucson aren’t equivalent to the city’s bicycle boulevards by any means.

Non-car modes of transportation can also benefit from tagging that suggests some kind of road hierarchy, but it’s better to keep these concerns separate and explicit rather than baking some kind of compromise into the primary feature tag.

Towards unified tagging of schools

I do understand the concerns with ISCED-tagging, but they seem to be mostly US-centric issues and ISCED seems to be the most workable right now. I’ve written my thoughts on the wiki page.

How do we know that ISCED is a fitting school classification system in most countries besides the U.S.? Its authors are pretty clear that it isn’t designed for this purpose. The official mappings only go in one direction, from a national classification to ISCED, so mappers have come up with unofficial mappings in the other direction. But even the official mappings use national terminology very loosely, because the goal isn’t really to preserve local distinctions.

If ISCED happens to line up well to national classification systems for the countries of interest to you, have you considered replacing the isced:2011:level key with school:XY using similar non-numeric values? That would pair more naturally with the other keys you’ve mentioned, like school:for and school:language. I don’t think this scheme is undermined by the presence of less rigorous school values.

About language: `language:xy’ on a language_school does become pretty confusing. If someone speaks “ab” and wants to learn “xy”, you’ll probably want to go to a school where the teachers and the administration speak “ab” and give the grammer rules in “ab”. To be consistent, we’d need to move the subject of teaching to some other key

That’s a fair point: as far as I can tell, language on a school is understood to be the medium of instruction, but language on a language school is understood to be the language taught. A couple years ago, there was a proposal to distinguish language purposes. But a simpler solution would be to use values other than yes, like language:en=spoken language:fr=taught for a French-language school serving English speakers. This approach isn’t possible with school:language, which is a single key taking language codes as values.

Towards unified tagging of schools

That one is mainly about the ISCED:2011-proposal and the main concern were the numerical codes instead of human-readable values, which has been fixed in the updated proposal which is more or less followed in the above text.

Some of the other criticisms of the previous ISCED proposal could be applied just as well to the current one, if not moreso. To be clear, these criticisms arise because of the use of ISCED levels in classifying school facilities, but there are possibly other niche uses of the scheme.

Schools might be operated in different languages - especially important in areas where multiple languages are spoken. As it turns out, school:language is already in use for this, but wasn’t documented in the wiki. Now it is!

The medium of instruction is more commonly tagged as language:xy=yes. This key is most common among language schools but can be applied to any school and, indeed, any point of interest.

textual/ortographic fixes to names

It’s officially San José in English, based on the Spanish name, so the debate is about which English name is the main one. As the wiki page suggests, it’s pretty complicated, but currently the unaccented name is the name in OSM. Fortunately, the names you’re looking at would be uncontroversial, so your tips will probably come in handy for me. Thanks again!

textual/ortographic fixes to names

(También hay una discusión relacionado sobre los errores tipográficos en los nombres de las iglesias en español.)

Thank you for thoroughly documenting your process here.

I’ve also encountered a lot of similar spelling mistakes in Spanish-speaking neighborhoods of San José, California. The signs of taquerías, panaderías, and carnicerías are usually posted in ALL CAPS, so the diacritics are omitted for convenience.1 Non–Spanish speakers either don’t know that there should be diacritics or don’t know which ones to use. Sometimes people even remove the diacritics, thinking that’s more faithful to the on-the-ground principle.2

The same problem affects the city’s Vietnamese-speaking neighborhoods so much that I added a short tip to the wiki about how to tag name:vi. Maybe there needs to be a page about name:es to raise awareness of this issue and, eventually, facilitate the development of QA validation rules to catch these problems earlier.

  1. Even though language authorities like the RAE and academic institutions like the Library of Congress have ended this practice, it persists in signmaking. 

  2. Incidentally, the name of this city has been the subject of a slow-moving edit war for years, reflecting a real-world dispute about whether it should include the acute mark. 

Thoughts on the shared bus stop dilemma

Thanks for this detailed description of the problem. My metropolitan area has some 40 different public transportation agencies that all overlap in exciting ways. For example, this train station is shared by two regional commuter railroads and an Amtrak-branded service run by a consortium of local governments. They share the same platforms, more or less, but all have different names, codes, and websites for the same station. One station is especially confusing because one of the railroads calls it by one name, but the railroad is part of the Amtrak network that calls it by a different name.

For the most part, we’ve been handling this situation using ad-hoc subkeys like name:Caltrain and railway:ref:ACE. However, this is unsatisfying because such subkeys stand no chance of ever being consumed by data consumers, especially if there’s any difference in how the network is spelled as part of a network value versus as a subkey.

I don’t think we should duplicate nodes to handle these situations. It’s still one bus stop, just with multiple signs and multiple services. The one feature principle comes to mind: duplicating the bus stops would throw off any statistics about the distribution of bus stops, and duplicating stop positions would require fudging some positions at heavily shared stops. (I would favor mapping multiple coincident traffic sign nodes if you’re getting into that level of micromapping, but that’s because there are multiple physical signs.)

Instead, I think it would be elegant to add the single bus stop node and single stop position node to multiple public_transport=stop_area relations, each corresponding to a different network, which in turn can be part of a single public_transport=stop_area_group relation. The network tag on the bus stop itself can establish the order in which the information should be listed when labeling the stop.

Redundant stop areas don’t seem like a big problem to me, because stop areas are abstractions anyways. By analogy, the multiple bus routes that serve this stop can have different networks and route numbers, but there’s no ambiguity as to which network corresponds to which route number, because each route has its own relation. That said, it would be nice to hear the opinion of someone more familiar with public transportation renderers, routers, or QA tools.

What does "privacy" mean for OpenStreetMap?

In the past, when mappers were unsure of whether a feature violates an expectation of privacy or not, a useful rule of thumb has been to consider whether the owner would perceive their property to have been singled out. This has even been a relevant consideration for the otherwise ordinary residences of very famous people.

If this episode had played out in a different order, with the woods and other nearby buildings and driveways being mapped alongside the one in question, en masse, perhaps the owner would not have felt threatened by the inclusion of their property. I myself have always ensured that my various places of residence were only ever mapped as part of a large addition of residences and other features. I can see others wanting at least the same level of obfuscation.

Unfortunately, in this case, things kept escalating. It’s impossible to say with certainty what would’ve headed off the back-and-forth. But sometimes just waiting for the “wrong” edit to persist for a little while can allow cooler heads to prevail with a more durable solution. They were persistent, but maybe they felt compelled to be extra persistent because of the involvement of multiple mappers, a siege mentality of sorts.

Thank you for documenting this case so we can learn from it as a community. Hopefully the time and effort you spent on it won’t be in vain.

Getting to know you

Voting has been suspended, but let’s not lose sight of the problems that this proposal identified and attempted to address. I hope everyone who voted will stick around to help us improve the state of the wiki’s translations.

The unfixable state of township boundaries

The fourth topmost level is the entire focus of this post. 😉 In the U.S., addresses generally follow the format “City, State”, so the municipal boundaries are arguably the most important. But addresses don’t respect boundaries at all anyways.

The unfixable state of township boundaries

The state already thought of that by providing for maritime township boundaries.

Evaluating school classification tagging schemes for the United States

I think what this means for us is that we can only recommend tagging isced:level=* in conjunction with school=* and grades=*, but never on its own, at least not for amenity=school. Any presets about school types would be based on school=*; the mapper would have to fill out isced:level=* manually.