OpenStreetMap

Minh Nguyen's diary

Recent diary entries

Position statement for spring 2019 OpenStreetMap U.S. board election

Posted by Minh Nguyen on 1 April 2019 in English (English)

I’m running in the upcoming election to fill an open seat on the OpenStreetMap U.S. board. I’ve been contributing to the OpenStreetMap project as a volunteer since April 2008, mostly by armchair mapping, shoe-leather mapping, and fiddling with the wiki, where I recently became an administrator. I’ve also done a bit of importing and as much evangelization as the folks around me will bear. (My day job, writing iOS software at Mapbox, also intersects with OSM to some extent.)

OSM has the potential to be profoundly more than a tech project. When we see someone lavishly micromapping their neighborhood, we can rejoice that we’ve given them and their community a voice, that OSM has broken new ground, that we’re chipping away at an ancient, one-size-fits-all, top-down approach to mapping the world. We’re incredibly lucky that many people have discovered our project and figured out how to contribute. Just imagine how many more people will discover us once we figure out how to speak to their interests.

As a board member for 2019, I’d like to work with other board members and the newly appointed executive director to:

  • Stay in touch with OSMUS members. The recent town hall and virtual mappy hour felt like long-awaited reunions. I’m glad for that, but we should hold more of these meetings and get more people to show up beyond the mailing list and Slack regulars.
  • Develop resources to help community members plan and execute imports of public-sector data. Besides navigating the import process, mappers need help identifying datasets and approaching data owners with confidence. Lowering these barriers can help ensure that what we end up importing is the best data available.
  • Promote the use of OSMUS server resources for tools that benefit U.S. mappers and outreach efforts. What would a U.S.-centric renderer look like? What tools would be more useful in your day-to-day mapping if we had U.S.-specific versions of them?
  • Forge closer relationships with other free-culture and civic groups, including Wikimedia and Code for America. These communities already have the kind of distributed infrastructure that we imagine when we talk about OSMUS doing local outreach. Local wiki and civic hacking groups are already aware of OSM but may not be aware of the ways OSM can complement their message and power their projects.

This is the first time I’ve run for a leadership position in the OSM community. I’m jumping into this election knowing that I have a lot to learn, but hopefully also some unique perspective to share with OSMUS at this special time in the organization’s history.

We’re lucky that we have too many well-qualified candidates to choose from for this one open seat. 😄 I encourage you to read all of our position statements and cast your vote by Sunday, April 7. See you on the other side!

Location: Farmington Acres, Oakland County, Michigan, 48335, USA

Oldenburg

Posted by Minh Nguyen on 2 January 2019 in English (English)

Oldenburg is a small town in southeastern Indiana, not far from where I grew up, where German is in some respects the official language. The town is proud of its German heritage. If you rely on a proprietary map for directions from Batesville to Metamora, you’ll probably miss a right turn at Oldenburg, which posts street names in German and only begrudgingly in English.

I don’t speak a lick of German, but when I found out that most of Oldenburg is included in a federally designated historic district, the town promised to be a good time sink. What better way to poke at a moat than to make a detailed 3D map of this Old World town, replete with German street names?

OSM versus Google Maps versus Apple Maps

To be included in the National Register of Historic Places, a building needs to be described in a publicly accessible nomination form. In the case of Oldenburg, that means nearly every freestanding structure in town, plus a patch or two of sidewalk, is given a full historical and architectural treatment. The document includes building names (typically after the original owner), floor counts, roof shapes, construction materials, and plenty of details we don’t have established tags for yet.

(Like many NRHP submissions, the one for Oldenburg is in the public domain because it was first published before 1989 without a copyright notice or registration.)

There’s still some work left to do: at least one 3D renderer, F4 Map, randomizes attributes like building heights and tree leaf type when you leave those details out, and there are still a few streets whose German names I couldn’t ascertain (while trying to avoid copying out of Google Street View). Hopefully this post will inspire a mapper in Cincinnati, Indianapolis, or Louisville to make a pilgrimage to this former monastery town for some proper field surveying.

There are thousands of historic districts on the NRHP all across the U.S., and not all of them require translation into English. What other historically significant places await your micromapping skills?

(Photo of Oldenburg skyline © 2016 Chris Flook, CC BY-SA)

Location: Oldenburg Historic District, Oldenburg, Franklin County, Indiana, USA

duration=P10Y1D

Posted by Minh Nguyen on 9 April 2018 in English (English)

This OpenStreetMap account celebrated its tenth birthday yesterday. (Happy birthday, OSM Account!)

Over the last decade, contributing to OSM has:

  1. helped me cope with homesickness
  2. given me too many good reasons to procrastinate on schoolwork
  3. led me to old friends
  4. led me to new friends
  5. taken my inner roadgeek to a new level
  6. taught me more about geography and GIS than I’d ever learn from a book
  7. made me a worse driver (but a better notetaker)
  8. helped me land a job
  9. given my open source contributions a purpose beyond attribution
  10. kept me from forgetting where I come from

Thank you, OSM community, for welcoming my contributions in the first place and for making my stay here as rewarding and meaningful as it has been. Here’s to countless more changesets, mailing list posts, wiki edits, and pull requests!

A cow made of corn

Posted by Minh Nguyen on 20 September 2017 in English (English)

It’s corn maze season in North America: for a couple months, farms all over are inviting folks to explore mazes they’ve cut out of corn fields.

CVNP A corn maze in Northeast Ohio by David Fulmer, CC BY 2.0.

In OpenStreetMap, several corn mazes have been micromapped across Southwest Ohio. The designs change each fall, so the mazes have to be micromapped all over again.

Wendel Farms pumpkins Look sideways: these pumpkins at Wendel Farms were previously visible in Bing and other aerial imagery, but the ways clearly need to be deleted now that the pumpkin design has been cut down (and replaced).

None of the aerial imagery providers have this fall’s maze designs yet – after all, these doodles are only weeks old in some cases. So I turned to the farms’ websites and Facebook pages, where farmers have posted aerial photos of their own mazes. I wrote to them, briefly describing OpenStreetMap and asking permission to update the map based on their photos. A few got back to me, happily giving permission. It’s free publicity for them, after all.

Wendel Farms cow This year, Wendel Farms’s corn maze depicts a cow surrounded by various dairy products.

Since my preferred editor, iD, currently lacks support for overlaying arbitrary images, I built a copy locally and modified it to display the photo above the normal imagery and below the data. Using my browser’s Web inspector, I added a pattern to the SVG document’s <defs> element:

<pattern patternUnits="userSpaceOnUse" id="maze" width="956" height="1174">
    <image x="0" width="956" height="1174" href="dist/img/pattern/maze.jpg" y="200" />
</pattern>

and overrode the CSS style of the <path> element representing the edges of the field:

.area-fill > .w107158117 {
    fill: url("#maze") !important;
}

I probably could’ve done something more sophisticated to keep the image anchored while panning and zooming, but this was good enough for a quick, informal micromapping project.

Wendel Farms cow

Not too long from now, these corn mazes will’ve been cut down and plowed under, and the paths will also be deleted from OpenStreetMap in anticipation of next year’s corn maze season. But a snapshot of this fall’s designs will eventually appear in the standard aerial imagery and remain there for years to come. As a result, mappers can compare a corn maze’s appearance and the date on which the maze was micromapped to determine the vintage of the surrounding area’s imagery.

In the meantime, if you poke around Southwest Ohio, be on the lookout for cows and other scenes hidden among the cornstalks.

Location: Newkirk, Reily Township, Butler County, Ohio, USA

Finding Wilson Boulevard

Posted by Minh Nguyen on 22 May 2017 in English (English)

(This post is cross-posted from a recent post on my blog and adapted for an audience already familiar with OpenStreetMap.)

An overflowing bánh mì, a tray of tender bánh da lợn, a can of soybean milk: my treat after every monthly trip to the little Vietnamese grocery across town. Mekong Market was my Sunday Bible school of Vietnamese culture in a childhood as distant from Asia as one could imagine, in Cincinnati. Snacks, sauces, and canned foods defying translation lined the shelves; in the refrigerator, a variety of mystery meats wrapped in aluminum foil each bore the same place of origin: Chicago.

One Labor Day, my family made a trip up to Chicago to finally see the bustling Vietnamese community whose clearance we had happily bought for years. We made a lot of road trips back then, often just spur-of-the-moment driving through the peaceful countryside. But since we were headed five hours away to an unfamiliar city, we needed to plan ahead. As the resident map enthusiast, I was to find directions to the Vietnamese supermarket in Chicago using our new Internet connection. We’d enjoy some phở for lunch and bring back enough fresh ingredients to avoid Mekong Market for a little while.

A search for “Vietnamese markets in Chicago” on AltaVista turned up an article from The Washingtonian describing a cluster of supermarkets, phở restaurants, and bakeries on Wilson Boulevard. I pasted the street address into MapQuest, specified “Chicago” and “Illinois” to make sure I got the right “Wilson”, and printed out the directions.

Bánh mì thịt nguội
A Vietnamese cold cuts sandwich (bánh mì thịt nguội).

Five hours later, we arrived in Chicago and crawled up and down Wilson Avenue. If a Vietnamese supermarket or two were to be found along this street, it couldn’t have fit very easily inside any of the modest townhouses that lined the street from end to end without interruption. I noticed, too, that the entire length of the street was numbered in the 8000 range, as opposed to the 6700 block on which this supermarket supposedly stood. My father pulled the car aside and called the supermarket’s phone number on his cell phone. I could understand just enough Vietnamese to make out the voice on the other end: “I’m in Northern Virginia – what in the world do you want me to do for you?”

As my father held his tongue – Grandma was in the back seat – we wandered aimlessly around that part of town until we happened to spot some Vietnamese signage. There, just a few minutes away from Wilson Avenue, were the supermarket, phở restaurant, and bakery we had been hoping for, by sheer luck.


In the years since, I moved to San José, California, home to one of the largest populations of Vietnamese Americans in the country. Bánh mì shops here are as commonplace as cafés. In fact, the only reason I ever notice them is that I also became immersed in OSM. I found a niche mapping “flyover country” and made it my mission to improve coverage of communities underserved by commercial map vendors, among them ethnic enclaves in San José, Orange County, and elsewhere.

Last month, I happened to be in Washington, D.C., visiting my employer Mapbox at the new office there. On a lark, I decided to spend Sunday afternoon visiting Wilson Boulevard for real. It had been almost eighteen years since my last attempt, but despite having since moved to a city with a large Vietnamese population and plenty of Vietnamese food, I figured seeing this street in person would give me some closure. Fortunately, the same Metro line that took me almost to the airport also took me almost to Eden Center, the Vietnamese shopping center that had teased me back in grade school.

Parking aisles
No Vietnamese shopping center would be complete without a kitschy gate.

I had always imagined Eden Center to be more of a bazaar than a strip mall. Nonetheless, it has almost everything you’d expect from a center of Vietnamese social life: a dearth of parking, a man singing karaoke to an impromptu crowd out front, a father treating his daughters to the kumquats that hang from a decorative tree nearby. On the other hand, there are no elderly men playing cờ tướng in front of the shops, as one often finds in California. (One wall bears an enormous warning against gambling and suggests area casinos as alternatives.)

Like similar centers in Orange County, Eden Center is steeped in war history. Each aisle in the parking lot bears the name of a South Vietnamese general.

Parking aisles
At the intersection of Nguyễn Khoa Nam and Trần Văn Bá “Avenues”.

The South Vietnamese flag flies proudly beside the American flag. As it was the week before the anniversary of the Fall of Saigon, a banner spanning the two flagpoles honored South Vietnamese war heroes.

South Vietnamese heroes banner
The banner reads, “With Gratitude We Revere the Martyred National Heroes of the Republic of Vietnam”.

I thoroughly field-surveyed Eden Center, noting the restaurants, jewelers, beauty salons, travel agencies, and karaoke bars tucked away in the center’s “mini-malls”. Before leaving, I bought a bánh mì, a piping hot tray of bánh da lợn, and a can of soybean milk for the road.


The whole reason I got involved with “citizen mapping” is that proprietary map sources fall so short when it comes to places beyond San Francisco, beyond the central business districts, beyond the tourist traps.

Eden Center on Apple Maps
Apple Maps includes only a few shops, but they’re all in the wrong places and some are no longer open.

Eden Center on Google Maps
With the same indoor mapping style it applies to every mall, Google Maps makes it look like it has spectacular coverage of Eden Center. But it’s just walls: most of the shops are still in the wrong location and some have closed.

Eden Center on Baidu Maps
I found it surprising that Baidu Maps has coverage of this area on par with Apple Maps, but it too has misplaced and outdated points of interest.

OSM didn’t have a lot of detail about Eden Center until I ventured there last month, but now it’s complete, accurate, and up-to-date. Even the parking aisles are named. It’s looking a lot better than the competition.

Eden Center on OSM, before and after
After my visit to Eden Center, OSM gained so much detail in the area that there isn’t enough room to display most of the points of interest with proper icons and labels. (Left: before; right: after)

Eden mini-malls on OSM, before and after
(Top: before; bottom: after)

Eden Mini Mall on OSM, full detail
One of the advantages of a human-curated map database is an at-times quirky attention to detail. The abundance of diacritical marks in Vietnamese are essential to comprehension, so this Vietnamese-American community will find it helpful that OSM includes the diacritics, even though this shopping center is located in a predominantly English-speaking city. Maybe someday the highway=corridor ways will be useful for pedestrian routing, too.

OSM may have a long way to go before it can even dream of breaking people’s Google habits. I’m under no illusions about how poorly it scales to visit each site in person via public transportation. But for now, I’m just happy to have finally made it to Wilson Boulevard and made it easier for others to do the same – minus the detour.

Location: 38.874, -77.154

Highway shields, state by state

Posted by Minh Nguyen on 25 July 2016 in English (English)

With State of the Map U.S. still fresh on everyone’s mind, let’s revisit a favorite topic among many U.S. mappers: highway shields. We’ve been talking about ways to improve the sorry state of route shield support across the OSM ecosystem since at least 2011. We haven’t yet reached the vision outlined by Richard Weait in that SotM talk, but things aren’t as bleak as the osm.org renderers may let on.

In America, things are complicated

The national standard for U.S. state route markers is black numerals in a white oval. But almost every state eschews this oval in favor of its own design. (Some states have several, depending on the type of road.)


State highway shields by state (Chris-T)

In most states, the marker consists of a number in a distinctive shape, possibly with color:

K-10
Kansas state highways, such as K-10, are indicated on signage by a yellow sunflower, the state’s official flower. (Steve Alpert)

By adopting these various designs, maps can optimize usability for motorists. While driving, one should be able to compare the route shield the navigation application is displaying with guide signage up ahead, without having to know the ins and outs of the local road network. Getting the iconography correct is important because the same route number on a different shape may lead you in a different direction.

SR 562
Ohio state highways, such as SR 562, are indicated by the state’s simplified shape. (allen, CC BY-SA 4.0)

Regular expressions to the rescue?

Unlike most OSM-based maps, the recently departed MapQuest Open layer included different route shield designs for each state. But it relied on a fragile assumption that the way’s ref tag had to begin with the state’s postal abbreviation, e.g. ref=KS 10 in the example above. Parsing a way’s ref tag is suboptimal for various reasons:

  • There are known conflicts with other countries and other countries’ political subdivisions. For example, ref=CA… occurs in both California and the Cantabria region in Spain. ref=NH… occurs both in New Hampshire and throughout India.
  • Some states have multiple statewide route networks. Texas famously uses 12 distinct shield designs for state-level routes, and several of these networks overlap numerically. The postal abbreviation isn’t enough to distinguish one network from another.
  • Many states have county, township, or even city routes with distinct markers. In Ohio, each county has its own design, and within many of those counties, there are a variety of township route marker designs. A generic prefix like CR or TR is insufficient for selecting a suitable marker image, yet fully qualifying the jurisdiction on every way (ref=US:OH:MRW:SouthBloomfield 190) would also be tedious and error-prone.

In some states, the highway department doesn’t use the state’s postal abbreviation when abbreviating route numbers, nor do most residents in writing:


Ohio uses “SR” on variable message boards where a pictograph would be infeasible. (ODOT)

In Ohio, the consensus has been to tag ways with the “SR” prefix and rely on route relations for disambiguation. Some other states have similar practices. Unfortunately, this approach caused the state’s routes to be marked with the generic oval instead of the state’s shape. MapQuest Open is no longer available on osm.org, but the need to choose state-specific shields remains common among renderers.

For years, members of the U.S. OSM community have promoted an alternative, more flexible tagging scheme for highway routes: the ref and network tags on route relations. In 2013, Phil Gold developed an experimental “shield renderer” to demonstrate how a renderer might make use of this data:

Shields
The OSMUS shield renderer supports a variety of shield designs as well as route concurrencies.

But making use of route relations is a difficult problem for production-level consumers of OSM data, so parsing ways’ ref tags remains the most common approach to selecting route shields, despite the disadvantages above.


“Perl Problems”, xkcd (Randall Munroe)

A baby step using spatial queries

When I’m not mapping speed bumps and backyard swimming pools for fun, I work at Mapbox on the open-source Mapbox iOS SDK and the Mapbox GL renderer that powers it. Mapbox GL renders Mapbox Streets vector tiles, applying a stylesheet designed in Mapbox Studio.

The style language used by these tools doesn’t yet support regular expressions, so the vector tiles can’t include the raw way refs for the renderer to parse. Instead, when an OSM way is baked into a vector tile, a spatial query determines the relevant ISO 3166-2 code (the country code plus the postal abbreviation), which goes into an iso_3166_2 field.

This field makes it possible for anyone to create a map that includes state-specific route shields. To prove it, I used Studio to create a custom style, Interstate, based on the default Streets style:


With a regular expression–based filter, a conventional renderer would be unable to distinguish between Ohio and Indiana state route refs.

I wanted to get back to armchair mapping, so I only customized the shields for Ohio, Kentucky, and Indiana rather crudely. But with a bit of effort and more graphics design chops than I possess, the other states could receive similar treatment.

Note that this is only a stopgap solution to the problem of choosing state-specific shields: Mapbox GL doesn’t support grouped icons for route concurrencies yet, and ISO 3166-2 codes don’t identify counties, townships, or the myriad route networks in Texas. But it’s still better than seeing homogeneous white ovals everywhere.

Build your own Interstate

It takes a certain amount of roadgeekery to care about state-specific route shields on a map, but it doesn’t take any programming skills to design and publish a style like Interstate:

  1. Sign up for a free Mapbox account and open Studio.
  2. Click New Style and choose a template. (Other than Satellite and Empty, each of the templates is based on the Mapbox Streets source and thus OSM data.) For this style, I chose the Streets template: Templates
  3. Zoom the map in to level 10 or beyond and center it somewhere in the U.S., so that the generic oval state route shields are visible.
  4. The left sidebar lists the style’s layers, which are akin to layers in a vector graphics tool like Inkscape or Adobe Illustrator. Expand the “Highway shields” folder. Layers
  5. Two layers of interest are “road-shields-white” and “road-shields-black”. The former is for shields that have white text, such as for Interstate highways and California state highways, while the latter is for shields that have black text, including most states’ highway shields. For this style, select “road-shields-black”. (You can also click on the roads on the map to select these layers.)
  6. In the flyout for this layer, switch to the Icon tab and note that the Image property is set to {shield}-{reflen}. Mapbox GL fills in the shield and reflen tokens so that a three-digit state route gets the us-state-3 shield. Now switch to the “Select data” tab. The lengthy filter ensures that the layer doesn’t rope in any roads whose shields should have white text. The map highlights the ways that remain. These are the state routes you want to style. The filters beforehand Data on the map
  7. At the top of the layer list, click the Duplicate button and name the new layer “road-shields-state-black”. Duplicate
  8. In the new layer’s “Select data” tab, delete the existing shield filter. Replace it with one that includes a shield that “is any of” us-state. Then add another filter for an iso_3166_2 that “is any of” US-OH, US-KY, or US-IN. The resulting filters
  9. Back in the Style tab, in the Icon tab, change {shield}-{reflen} to {shield}-{reflen}-{iso_3166_2}. You’ll be adding icons with names like us-state-3-US-OH. (Alternatively, you could create a separate layer for each state, but more layers means more maintenance overhead and possibly worse performance.)
  10. Uh-oh: where previously there were generic state route shields on the map, now there are only numbers. Missing shields Click the “{}” button on this text field to open the icon manager flyout. The style already has a lot of icons, including highway shields for many countries. Icon manager flyout Click the “Add SVG Images” button and upload a roughly 20×20 SVG image for each state-reflen combination you want to support. If you’re looking for inspiration, here are some existing SVG route shield sets that you could adapt in Inkscape or Adobe Illustrator (MUTCD-compliant iconography is in the public domain, but note that some designs may be trademarked): * Shield templates from the 2013 shield renderer – Remove the text span from each image before use. Mapbox GL will superimpose the route number onto the shield. You can customize the number’s styling in Studio. * Shield blanks from Wikimedia Commons – You may want to remove the black background from many of these images. The black background improves visibility for reassurance markers, but it’s unnecessary when the shield is merely a “sticker” on a map. * Some crude images I made based on a few of the Wikimedia Commons images above
  11. After uploading the images, you should see them in the flyout, and the map should now show state-specific shields. Uploaded icons Shields fixed Finally, click the Publish button at the top of the left sidebar to make your changes public.

As seen on TV

I encourage you to check out Interstate. The map starts out at the Ohio-Kentucky-Indiana tripoint, so you can see the special style rules in action. Unlike the 2013 shield renderer, Interstate is the real deal: it’s served from production servers, ready to be embedded in a Web, desktop, or mobile application. (But please create your own copy using the instructions above instead of hotlinking this demonstration style.)

iOS
An example of what an iOS navigation application would look like using the Interstate style, the Mapbox iOS SDK, and the OSRM-powered Mapbox Directions API.

It’s worth noting that the ref tag isn’t just for renderers: OSRM includes the ref of each way along the route, so that a turn-by-turn navigation application can announce “Turn left onto SR 4.” If locals don’t refer to the highway as “OH 4”, neither should the voice announcement.

The ISO 3166-2 codes exposed by Mapbox Streets partly decouple the ref tag’s format from the visual output. This frees up the ref tag to reflect the notation that’s used by humans and verifiable “on the ground”, rather than some arbitrary standard enforced for the benefit of renderers. The sooner we wean renderers off their dependence on specific ref tag patterns, the sooner we can expect renderers to support route relations. I can’t wait for that day to arrive.

The map is a fractal

Posted by Minh Nguyen on 24 July 2016 in English (English)

I spent this morning watching live online transcripts of State of the Map U.S. roll in. (What a time to be alive!) Each year, there’s a talk or lightning talk that looks to the future. Alan McConchie’s talk today imagines the project’s possible trajectories, both good and bad. The eventual outcome may end up being some combination of Alan’s scenarios: a ghost town in some respects, a garden in other respects, a Borgesian map in Germany even.

In most of my eight years armchair-mapping for OpenStreetMap, I’ve stayed pretty close to where I started: in my hometown of Cincinnati, Ohio, in the United States. At some point, especially after moving across the country to Silicon Valley, I must’ve imagined that I’d eventually map Cincinnati to completion and move on to other, less well-tended areas. But that never happened. Instead, I found myself mapping the same places over and over again, even as my interests expanded to neighboring counties and states.

To me, OpenStreetMap behaves like a fractal: the beautiful structure in mathematics that gets more intricate the longer you stare at it.

Koch snowflake
Koch snowflake, António Miguel de Campos, public domain

Iteration 0

Koch snowflake iteration 0
Koch snowflake iteration 0, Wrtlprnft, public domain

The zeroth iteration, back in 2008, was tracing Yahoo! Aerial Imagery in Potlatch 1. Despite having tons of free time back then, the best I could do with the tools I had was:

  • Realigning TIGER-imported roads to Yahoo!’s blurry imagery
  • Dividing TIGER-imported freeways into dual carriageways, reclassifying them above highway=residential, and adding them to relations
  • Removing “(historical)” GNIS-imported schools and post offices or retagging them as historical=yes
  • Adding misshapen buildings (because Potlatch didn’t have a right-angle tool) that looked OK at z17
  • Tracing river centerlines
  • Delineating golf courses and cemeteries

Iteration 1

Koch snowflake iteration 1
Koch snowflake iteration 1, Wrtlprnft, public domain

When the first iteration began sometime in 2010, allowing me to trace Bing aerial imagery in Potlatch 2, any area I mapped had to meet a higher standard. But I also couldn’t help but go back and retrofit the old areas – areas I cared about most – to meet this higher standard too:

  • Realigning hand-drawn buildings to have right angles and look OK at z18
  • Adding cul-de-sacs and crosswalks
  • Adding soccer fields and tennis courts
  • Tracing riverbanks and wooded creeks
  • Tracing the paths in golf courses and cemeteries
  • Expanding the network of high-voltage power lines
  • Aligning county boundaries to better follow road and river centerlines

Iteration 2

Koch snowflake iteration 2
Koch snowflake iteration 2, Wrtlprnft, public domain

Roughly each year since then, Bing and Mapbox have provided ever sharper imagery for the area, and that imagery has revealed just how much the world changes on a regular basis. Again I had to retrofit my original areas of interest:

  • Adding traffic lights and alleys
  • Adding new wings to industrial buildings as they expand
  • Deleting houses as they’re torn down in urban renewal programs
  • Replacing farmland with residential subdivisions
  • Delineating and naming residential and retail developments
  • Aligning city limits to changes in road pavement quality

Meanwhile, the community grew alongside the map. People with very different interests than me built up bicycle infrastructure, identified countless coffee shops, filled in the map’s background with land use polygons, and even cleaned up the buildings I’d previously drawn.

Iteration 3

Koch snowflake iteration 3
Koch snowflake iteration 3, Wrtlprnft, public domain

These days, newer tools like iD and Mapillary mean so much more source material to map and so many more tempting presets to make use of:

  • Realigning hand-drawn buildings to look OK at z19 and counting their floors
  • Naming businesses that never would’ve caught my attention while field surveying
  • Entering lane counts, speed limits, and turn restrictions
  • Adding sidewalks, crosswalks, and driveways
  • Adding backyard swimming pools
  • Tracing drainage ditches

Bots, crowdsourcing tools like MapRoulette, and paid mappers are present, too, taking care of basic upkeep so fractal mappers like me can focus on the next level of detail.

The next iterations

Koch snowflake iteration 7
Koch snowflake iteration 7, Wrtlprnft, public domain

Today at State of the Map U.S., there’s talk of mapping curbs and verges, of tracing “road banks” the way we started tracing riverbanks in iteration 1, of using 3D building data in ways that’ll probably require ever granular measurements. Machine learning is taking the gruntwork out of armchair mapping, freeing us to map even more creatively.

Each time I embark on a new iteration of the fractal, I get much the same overwhelming feeling that I got when I first opened Potlatch 1 back in 2008. It’s like we’re starting from scratch, filling in data we didn’t expect to ever care about, except the tools are right here to support us.

Whatever the future appearance of OpenStreetMap and its community, the full potential is, fittingly, a version of the coastline paradox. Unlike Borges’ map, OpenStreetMap’s fractal remains the same size but its complexity, accuracy, and precision keeps increasing. As new organizations and technologies push ordinary field and armchair mappers like me out of manual tasks we used to perform, we continue to push the boundaries of what can be mapped, what can be articulated about the world.

Great Britain’s coastline
Great Britain’s coastline, measured at decreasing scale but increasing complexity, Alexandre Van de Sande, GFDL 1.2 or CC BY-SA 3.0

A complete map

Posted by Minh Nguyen on 24 August 2015 in English (English)

I saved my 10,000th changeset yesterday, as part of a months-long surveying and mapping spree in San José, California, where I currently live.

I never intended to map the Bay Area. Instead, I typically spend my free time helping to map my hometown of Cincinnati and tame TIGER deserts elsewhere in Ohio from the comfort of my (armless) chair. I always assumed that the middle of Silicon Valley would be full of tech enthusiasts who occupy their time by micromapping every last bench and bush. The map sure looked complete, with lots of highway=primarys and highway=secondarys, landuse areas covering every square inch, and plenty of rail and bike infrastructure.

But then, in April, I zoomed in. I had recently joined Mapbox to work on iOS map software, and the Show My Location function went right to my unmapped doorstep. Around me was an endless parade of outdated street configurations, missing landmarks, test edits, proposed BART stations tagged as the real thing, and GNIS-imported hospitals that had been closed for years. Most of the map hadn’t been touched in six years. In terms of POIs like shops and restaurants, central San José in 2015 was as blank as Cincinnati was in 2008. (San José is the country’s tenth-largest city, with a population 3½ times that of Cincinnati.)

before
Zoom in all the way to the spot marked San José, and this is what you would’ve found earlier this year.

As I added in pent-up local knowledge, I couldn’t help but notice some unfortunate tagging practices. The Bay Area is (ahem) liberal in its use of highway=primary and highway=secondary. It wasn’t difficult to find quiet residential roads with speed bumps, Child at Play signs, or unsignalized crosswalks being tagged as secondary, the same tag often used for heavily-used roads in other cities or 55 mph state highways in rural areas.

Most of the giant landuse areas that blanket the city need to be redrawn. Many landuse=residential areas conflate distinct neighborhoods or include tree-lined business districts (which look like residential areas from the air). Meanwhile, many industrial areas are being converted into residential areas due to a local housing boom. As much as possible, I’m replacing these generic landuse areas with more specific ones that correspond to individually named subdivisions, office parks, and retail complexes.

landuse
A typical landuse=residential area in San José spans multiple highway=primary roads. Either these aren’t really primary roads or this isn’t really one coherent residential area.

I suspect that the highway classifications and generic landuse areas, combined with decent rail data, made the map look a lot more complete than it really was. To a newcomer, the total absence of restaurants, buildings, and non-armchair-mappable information might’ve looked like a limitation of the project rather than a blank slate waiting to be edited. And again, there should be no shortage of visitors from San José, because this is Silicon Valley, where people talk about things like OSM. I’m sure the original mappers were doing their best at the time; unfortunately, six years ago, none of us knew as much about mapping ago as we do now.

San José is looking a lot better after an intense few months of surveying. There are plenty of POIs downtown – too many to fit onto the map at z19, in fact – as well as invisible attributes like speed, weight, height, and turn restrictions. I’m having particular fun mapping the many ethnic enclaves around town, which are very poorly represented on commercial map services.

caribees
This popular ethnic strip mall is now fully mapped in OSM (seen here in iD at z20). Apple and Google make a mess of things.

dgsmetoc
The San José place=city POI incorrectly sat 12 blocks away from where it should’ve been, at the site of this church, which incidentally is missing from Apple, Google, and HERE.

dgsmetoc name
Meanwhile, OSM now includes that church, as well as its full Amharic Ethiopian name. (Deciphering the Amharic signage was a challenge in itself.)

Still, that’s only one city. We’ve always known TIGER deserts are a problem, but are other cities similarly languishing after an initial burst of detail, flying under the radar because we all think they’re being taken care of? Maybe we can prevent that from happening in the future by making the map look only as complete as it really is.

Location: Downtown Historic District, Japantown, San José, Santa Clara County, California, 95113, USA

Globalizing the name translation debate

Posted by Minh Nguyen on 5 June 2015 in English (English)

The world is messy and human languages moreso. Recently the talk@ mailing list erupted in discussion over a proposal to shunt the vast majority of name:* tags over to Wikidata. But most of the discussion has centered around rather eurocentric examples and concerns. I worry that the discussion will lead to a policy change based on overgeneralizations. Having done a fair amount of multilingual name-tagging in the past, I want to point out just a few of the complications that monolingual mappers may be unaware of.

Translation versus transliteration

The top 20 languages are each natively spoken by about one percent of the world’s population. Twelve of them are in scripts other than Latin, and at least three are in non-alphabetic scripts, requiring transliteration just to produce a name that monolingual English speakers can recognize as text, let alone type.

Some have argued that translations are preferable to transliterations. Others have argued that transliterations should be omitted entirely from OSM, as an exercise to the reader or a job for third-party services. But what’s the difference between translation and transliteration? The wiki offers this simplistic explanation:

Transliteration is the process of taking a name in one language, and simply changing letters from one script to another.

This definition is a gross oversimplification, downplaying what it takes to adapt a foreign word to something you can use in your own language. There are three ways to go about it:

  • Transcription from another language gets you the original word’s pronunciation respelled in a very literal phonetic alphabet (or a language-neutral alphabet like the IPA), without regard for etymology. Except for cases involving ideographic scripts, as we’ll see below, pure transcription is almost never the right answer for a name:* tag.
  • Transliteration from another script to a Roman alphabet gets you the original word, but respelled as if English had borrowed the word, often taking liberties with the pronunciation in order to look “native” or respect the original etymology. Transliteration is the most reliable method for producing a usable name in your language.
  • Translation from another language to English gets you a word that refers to the same thing in English but may have a completely different pronunciation and etymology. Translation is only appropriate in a limited number of cases for historical reasons. Words like “north” and “city” are often translated while the rest of the name is transliterated.

I don’t speak Russian; perhaps one can get Абергавенни from “Abergavenny” by performing a simple one-to-one mapping from Cyrillic letters to Latin letters. But Russian has varying transliteration schemes, each with their own exceptions, and that’s a relatively easy task considering that the Roman and Cyrillic scripts share a common ancestor.

A counterexample: transliterating Chinese to Vietnamese

Shanghai Railway Station façade
Shanghai has a Vietnamese name. You’ll never see it on signage in Shanghai, but no Vietnamese speaker refers to the city by its Chinese name. (Photo: Immanuel Giel / CC BY-SA 4.0)

Over the last seven years, I’ve added tens of thousands of name:vi tags by hand, the vast majority of them to place POIs and relations in mainland China. One of these POIs is Shanghai, called 上海 in Chinese. English-language literature calls it “Shanghai”, after the Pinyin transcription Shànghǎi. Shanghai is just a name to English speakers; it retains the pronunciation, more or less, but not the meaning. A literal translation would be “High Sea” or, more poetically, “Upon the Sea”. You’d never put “Upon the Sea” into OpenStreetMap because no one has ever called it that. You’d set name:en=Shanghai because English has no special name for the city.

Vietnamese is very different when it comes to Chinese names. Vietnam has had millennia of intense contact with China (much of it adversarial). As a result, every Chinese character has a Sino-Vietnamese reading: a word that was borrowed from Middle Chinese into Old Vietnamese, retaining the meaning but not the pronunciation (owing to changes in both spoken Vietnamese and spoken Chinese over the centuries). For Shanghai, I set name:vi=Thượng Hải, using Sino-Vietnamese for 上海. It literally means “high sea”, but in words that are only used for terms and names borrowed from Chinese.

As it happens, 上 has multiple readings corresponding to different meanings: thưởng (award), thượng (high), thướng (rise). Choosing between them is the task of a translator, not a SQL transform. So how does a translator like me know choose the right Sino-Vietnamese words? Sometimes the answer is obvious: I simply learned long ago that Shanghai is called Thượng Hải in the course of learning Vietnamese, and most Vietnamese learn that just by living in Vietnam for a time. For more obscure names, there are plenty of places to look up individual characters. My sources have included an out-of-copyright dictionary and a Sino-Vietnamese database that comes with no restrictions according to its author. (For the record, Unihan is TIGER bad when it comes to Vietnamese.) When I’m on the fence about a transliteration, I double-check it against sites like the Vietnamese Wikipedia. And when a character really has me stumped, I leave the POI alone.

If I were to actually translate “Shanghai” into “plain” Vietnamese, the result would be either Trên Biển if I transliterate at the same time or something like 𨕭𣛟 if I don’t. (The Vietnamese language also used ideographic characters until the 20th century, just a different set of characters than Chinese.) No one would ever use the “plain” Vietnamese name, though; Thượng Hải is the only correct way to render this particular city’s name in Vietnamese.

This is just one language out of many that have rich histories of dealing with multiple writing systems. You can imagine that other languages have their own unique considerations.

Machine transliteration is impractical


If we rely on software to localize place names for us, some languages can hope for no better than hack jobs, akin to this humorous map in “English”. (Illustration: imkharn)

There has been plenty of handwaving about renderers and geocoders that are smart enough to transliterate between different writing systems. But consider that Google Translate, with all its NLP might and a corpus the size of the Internet, fares poorly at interpreting Chinese place names. It doesn’t know that 红寺堡 is Hồng Tự Bảo in Vietnamese or “Hongsibao” in English. Your average mapmaker can’t afford that kind of technology anyways.

Software developers have much more experience converting between metric and imperial units than between human languages. Even though Sino-Vietnamese words aren’t “plain”, modern Vietnamese words, their meanings are often not lost on Vietnamese speakers today. Any schoolchild could tell you that thượng hải means trên biển (upon the sea), an apt name for a major port city. But a multilingual software client, burdened with the knowledge that thượng could also mean 㐀 = “hill”, or 㠪 = “five”, or 尙 = “yet”, would need a lot of resources to make a decision:

  1. Natural language processing (NLP), a form of artificial intelligence
  2. Context about the city and common naming practices
  3. A decent, machine-readable, suitably licensed dictionary for that particular language pair
  4. Possibly even dedicated logic for each character, multiplied by the number of transliteration schemes

Then there are suggestions that IPA transcriptions could be tagged as an intermediate step. But IPA comes with its own headaches, like whether to transcribe broadly or narrowly. Consider the number of valid English pronunciations of “north”, then consider that the same Chinese script is used by a host of mutually-unintelligible language varieties.

It wouldn’t be possible to derive the Sino-Vietnamese name from an IPA or Pinyin transcription, anyways, because they have different many-to-many mappings between characters and words. Shàng (Pinyin) doesn’t just correspond to 上; it also corresponds to the following characters, as would an IPA transcription based on Mandarin: 上姠尙尚蠰銄鑜. On the other hand, thượng (Sino-Vietnamese) corresponds to a very different set of Chinese characters: 㐀㠪丄仩上鞜妴尙尚鞝躺𠄞. Spoken Mandarin and Vietnamese have evolved so much over the centuries that, if a system like Sino-Vietnamese were invented today based on modern Mandarin pronunciation instead of Middle Chinese, it would employ a completely different set of words for each character.

There is a consensus at least that automatic transliteration does not belong in OSM, because it cannot be verified for accuracy. But excluding handcrafted transliterations from OSM forces data consumers to foist those same automatic, unverified algorithms upon their users. The result is the worst of both worlds: poor support due to the effort required and poor quality due to a lack of context.

Routesheds

Posted by Minh Nguyen on 13 October 2014 in English (English)

Warning: This post makes absolutely no sense to anyone outside the United States, or to anyone who relies on a mode of transportation that uses a sane numbering scheme.

Development on the OSM U.S. shield renderer seems to have stalled a bit, and my request to render pictoral route shields on the Standard stylesheet is effectively tabled for now. There doesn’t seem to be a whole lot to get excited about on the shield rendering front.

Just to bide my time, I decided to approach route shields from the other direction, using OpenStreetMap’s coverage of the Cincinnati Tri-State area as a starting point. Slapping shields in random locations all over a road map is so… functional. So let’s ditch the map, fire up TileMill, and let the shields do the talking:

The first thing you see

A bit of a mess, isn’t it? It’s even worse when you zoom out:

The next thing you see

But pan around, and you might start to notice some patterns if you’re from the area. You can make out the most prominent highways. Here’s a map Nate made from OSM data, for comparison:

Back to reality

You can make out the Ohio–Kentucky state line where the state route shields in the shape of Ohio turn into plain old circles:

These states have a lot in common, really

So what’s going on? Each shield on the grid indicates the nearest Interstate, U.S. route, or state route, with an understandable bias towards highways over surface streets. A shield isn’t necessarily positioned along the route it indicates; rather, it just happens that no other route is closer. That property gives rise to an interesting phenomenon that I’ll take the liberty of calling a routeshed. Much like a watershed, a routeshed encompasses the area in which cars naturally flow toward a single route. OK, that sounds so ridiculous it belongs in a Wikipedia article. But this is what the sparse terrain of western Hamilton County looks like:

No choice but the Interstate

If you’re as sleep-deprived as I am, it does kinda-sorta look like a watershed.

Here’s the full slippy map.

Credits: Road data from OpenStreetMap contributors, mostly me, Nate, and NE2. Shield blanks from Wikimedia Commons users SPUI, Ltljltlj, Fredddie, and Scott5114. Shield labels set in Roadgeek 2005 Series C and D. Code under the MIT license; tiles under Creative Commons Attribution 4.0.

Location: Betts-Longworth Historic District, West End, Cincinnati, Hamilton County, Ohio, 45203, USA

U.S. highway coordination

Posted by Minh Nguyen on 13 April 2009 in English (English)

Just a heads-up to mappers in the United States: we’re starting to tag highways with relation:route (see the tagging guide). Entire highways will be joined with one (two?) relations. Because Interstate highways and U.S. Routes cross state lines, we’re trying to coordinate our efforts with two wiki pages:

One of the driving forces for using relations is that we can cleanly handle concurrencies. Eventually, we can also look forward to custom highway shields like we’re used to seeing on virtually all other maps. Looks like it hasn’t been mentioned on Planet OSM yet, but here’s a nice demo of the shields being rendered.

Dorm OSM tutorial

Posted by Minh Nguyen on 4 February 2009 in English (English)

Tonight I organized a brief tutorial on contributing to OpenStreetMap at my dorm. My dormmates raised some good questions that I didn’t have the answers to off the top of my head. One of the questions was whether there was a way to tag historical features that are now gone. Besides railroad rights-of-way and the old_name key, I couldn’t think of a general way to map features that are entirely gone. I also fielded the standard questions about vandalism.

Location: 37.424, -122.166

Nothing to see here

Posted by Minh Nguyen on 5 December 2008 in English (English)

A traveling salesman plans to attend a conference of traveling salesmen and wants to drive from point A to point B and back using the shortest, quickest route possible. He first tries the obvious tool for the job, Google Maps, which times out unexpectedly. Yahoo! Maps, Live Maps, and MSN Maps all do likewise when given the same query. MapQuest returns a more helpful 500 Internal Server Error after a few minutes.

In a fit of desperation, he consults OpenStreetMap, which routes him through null and undefined. The traveling salesman is now enlightened about the NP-hard class of problems.

Edited 24 January 2017: Replaced OSM Gazetteer links with Nominatim links.

Hundred-hour flood

Posted by Minh Nguyen on 4 October 2008 in English (English)

The past few months, I've been turning the Little Miami River in Southwestern Ohio into a space-filling river. Every time I touch it, though, the area “floods”, because I keep forgetting to keep the river’s ways going clockwise and each island or sandbar going counterclockwise. Between Mapnik and Osmarender, I can’t ever keep the place dry. :^) There’s gotta be a better way.

Location: Miami Bluffs, Hopkinsville, Warren County, Ohio, USA

Flood

Posted by Minh Nguyen on 20 April 2008 in English (English)

Unsure how to map a space-filling river, I inadvertently flooded Branch Hill and much of Miami Township. Oops.

Location: Miami Grove, Symmes Township, Hamilton County, Ohio, 45147, USA

Loveland neighborhoods

Posted by Minh Nguyen on 14 April 2008 in English (English)

Loveland, Ohio now features a bunch of subdivision points, thanks to this neighborhood map.