OpenStreetMap logo OpenStreetMap

This is not going very fast

Posted by n76 on 13 August 2020 in English.

I haven’t had much experience in performing an import and the Orange County, California buildings and addresses is the first (and at this rate only) import I’ve instigated.

Other Imports

I assisted with a building import for Cupertino when I lived in Silicon Valley. And I added a couple of buildings in support of the Los Angeles County import a few of years ago. But in both cases my contributions were very small.

Most of my experience with imports has been in attempting to clean them up.

TIGER and NHD

Anyone who has edited in the United States will have run into “TIGER deserts” and I’ve spent my time in purgatory in those deserts. And if you‘ve edited in the rural areas of California may have run into some imports from the national hydrological dataset (NHD) which doesn‘t seem to be much better for water than the old TIGER was for roads. The two can interact in annoying ways. At least annoying to someone who has a desire to keep the number of suspect issues reported by Osmose down. For example:

  • You map a track or trail out of the mountains to where it connects to a road in the Central Valley.
  • That road is from the TIGER import and has never been updated.
  • So you take a moment to correct the alignment where the new trail or track connects to the road and maybe add a surface tag, etc.
  • You are now the last editor that touched that road.
  • But that road extends a fair distance and it crosses a number of waterways from a NHD import.
  • Osmose (and possibly other tools) now report that you are the most recent editor of a road that is crossing waterways.
  • So you go back and try to correct those issues. In at least few cases you will find the waterways goes under the road through a culvert.
  • So you split the waterways and add the appropriate tunnel=culvert and layer=-1 tags.
  • You are now the last editor of those waterways.
  • Which happens to cross a several more TIGER imported roads so your Osmose error count just went up.
  • Wash, rinse and repeat. I don’t think I’ll ever get done with this.

San Diego Address Imports

When I moved to Southern California several years back I became aware of problems with an address import that had been done for San Diego County. In attempting to make things better I made a conscious effort to touch as few tags as possible. So there were a lot of is_in:*=* tags I left even though they were deprecated at the time. Well the QA tools have gotten more verbose over time and Osmose started nagging me about those. In addition to a bunch of duplicate addresses and/or address streets not matching street names that I can‘t remotely resolve. My biggest regret on my attempt to clean up the San Diego addresses is that I used my normal OSM ID rather than creating a new “fix it” ID.

Mapping in San Clemente

When I moved to San Clemente I spent quite a few days walking the streets to collect address data. It gave me a pretty good overview of my new home town, gave me exercise, and allowed me to add what I surveyed to OpenStreetMap. I was able to map the addresses and buildings for nearly everything within a couple of miles of my house.

But there are some gated neighborhoods that I could not access. And there were many more neighborhoods that were too far from my home for easy mapping. So there have been annoying blank spots on the map of my city.

Maybe An Import For Those Blank Spots

I recently became aware that there is a “Map With AI” layer that contains building outlines that could be imported. I took a look at it and found that the outlines are the ones provided by Microsoft/Bing and are, in my opinion, far too low a quality for OSM.

But that got me to looking around for a dataset that could be imported that was better. It turns out that Orange County has published a dataset with a Public Domain license containing building outlines, addresses and, on some buildings, elevation and height information.

So I decided to set up an import.

Issues

I‘ve imported a tiny fraction of the data and it is not easy going.

  • The building outlines are the same Microsoft/Bing data that was not worth importing by itself. The result is every single building needs to be corrected before it can be saved to OpenStreetMap.
  • It seems to take me longer to correct a bad building outline than it would to create a new one.
  • The address data is not as clean as I thought when I first examined it. Not horrible, but there are duplicate addresses, missing addresses and addresses that are obviously on the wrong street.

Slow going

I don‘t want to have any changeset from this import to be thought of as yet another reason why imports should be discouraged, so it is taking me hours for each little area.

For example, yesterday I noticed an obvious address error. The fact it was an error was obvious but unfortunately the correction was not obvious. So this morning I drove to the area in question to do a survey.

If I actually have to survey each area, then what good is an doing this as an import? And I am not up to walking every street in the county, I‘d never get done.

Going Forward

For San Clemente I will continue to slog though this data and curate and edit it enough that I feel it is adequate for OpenStreetMap. But scaling the building by building corrections up for all of Orange County is way to big a job for me.

I am thinking about centroiding the building outlines into single points and then only importing the addresses for the rest of the county. That would probably be much faster and would provide some benefit to OpenStreetMap.

Email icon Bluesky Icon Facebook Icon LinkedIn Icon Mastodon Icon Telegram Icon X Icon

Discussion

Comment from CloCkWeRX on 14 August 2020 at 10:24

Are there ways you could turn this into a maproullette challenge to spread the effort, an you take on a higher level QA role? Are the buildings all uniformly misplaced by something a transformation could correct, or each uniquely wrong?

Comment from Sanderd17 on 14 August 2020 at 10:30

Personally, I think Osmose is way too strict on many issues. And especially to strict in assigning issues to people.

I was a bigger fan of keep-right, which puts more emphasis on the error than on the mapper.

I really don’t care if I fix an issue, that my name will be attached to the other issues of that road.

Comment from n76 on 14 August 2020 at 16:46

Regarding building outlines: Each building has a different problem. If the building is actually rectangular the outline can be rotated, off on one or more edges or the whole thing off by some translation. But the more usual case is that the building is not rectangular and invariably in that case the outline bears little resemblance to the imagery. In general the building is much more rectangular than the building itself (“L” or “T” shaped building image but the import data shows a rectangle, etc.).

Maybe, once I have finished with my local city, I will turn the rest of the county into a maproullette challenge. That may be the only way that it will ever get completed.

Comment from impiaaa on 14 August 2020 at 18:15

Thanks for putting in the hard work! I agree, it can seem slow sometimes. IMO open data, even if not high quality enough to be imported directly, is still useful for finding areas that need to be surveyed, and providing supplemental data. Importing only parcel or building centroids sounds like a good idea. Address info is valuable on its own and can, again, give hints to mappers to which buildings need to be outlined.

Comment from stevea on 23 August 2020 at 17:54

There is no shame in abandoning an import, when (as you describe), “what good is doing this as an import?” tips the scale to “it’s more work to import than ‘do from scratch.’”

Sometimes it really is better to realize this and bail. The tricky part is knowing when you are at the before, during or after the “tipping point of no return” where your time and effort invested is worthy or wasted. For some imports you can’t know this until the data are imported (or begun to). So an important lesson about imports is to eat some bite-sized chunks first, to see if they are tasty and no choking is involved. If results are nourishing, keep eating. Otherwise, put your fork down and excuse yourself from the table. Yes, it isn’t always easy to know when to do this, or whether it is a good idea, but it usually is when you get to asking that “what good is this import?” question.

Thank you for sharing your experiences and to everybody who has talked about this here; it’s great.

SteveA

Log in to leave a comment