OpenStreetMap logo OpenStreetMap

Johnwhelan's Diary

Recent diary entries

Sometimes I think I’m a bottom feeder going through the map and cleaning up the obvious errors and this comes from my observations over time and cleaning up thousands of duplicate buildings.

Because many countries do not have good census data you can do a rough calculation by multiplying the number of houses by an average number of people who live in them. It isn’t perfect but if you have nothing better then it works.

So duplicate buildings are a problem. When you’re looking to see how many schools you need duplicate buildings mess the numbers up. Buildings incorrectly tagged or not tagged at all also mess the numbers up.

Then we get to the imagery used. These days I’m seeing more and more microsoft BuildingFootprints tags on buildings. They are normally very accurate and align well with Bing imagery. The largest most accurate satellites are only accurate to 60 meters so the imagery has to be aligned. My recommendation would be to use Bing imagery and find a building that aligns with it. Then switch to your preferred imagery and align it with the Bing aligned one.

This helps ensure that buildings are only mapped once and when you add buildings you don’t add an existing building and if someone imports using microsoft BuildingFootprints then your building won’t be remapped 10 meters away in the future.

Disposable mappers are often used meaning many will only may a few times. So we don’t really have time to train them.

We want to get the most accurate mapping we can from them and as many buildings as we can. With JOSM buildings_tool plugin I can highlight one building then add more by clicking and holding down the button, moving to the opposite corner of the building and releasing the mouse button. If they aren’t in line it takes another mouse button click. You get a rectangular building correctly labelled. This is far less than using iD. Plus you don’t need a validator to inspect each one to see if it is correct.

You need to install JOSM and Microsoft openJDK but you’ll get a lot more buildings out of them. It is possible to set it all up on a USB stick and run it from there.

With iD mapped buildings there is room for error in tagging. Some aren’t tagged at all, I’ve seen some tagged barrier=fence amongst others. Also many buildings are rectangular in shape but you won’t think it from the mapping.

The buildings tagged microsoft BuildingFootprints implies an import. Officially there is a formal process for an import. If you are importing please follow it. Part of the requirement is to visually check to see if the building has already been mapped. I strongly suspect this is not being followed in all cases.

Thanks for reading

John

cleaning up after a task manager task

Posted by Johnwhelan on 1 January 2022 in English. Last updated on 3 January 2022.

In the ideal world all tasks would be validated to a high standard but unfortunately this doesn’t happen.

I’ve seen HOT projects with nearly two thousand duplicate buildings, I’ve seen some with two or three hundred untagged buildings. Some are well mapped and have no errors.

So this a method to clean up after the event. Basically you load up the area into JOSM and run the duplicate building script and JOSM validator. It won’t catch everything but it’s a lot faster than validating each tile. The todo and Mapathoner plugin are required.

select the errors then add them to the todo list and work your way through them.

The key is aoiBBOX”: [6.473334,5.172193,6.870011,5.730142] which is found here: https://tasking-manager-tm4-production-api.hotosm.org/api/v2/projects/10756 for project 10756. You’ll need to search the text to extract it.

You can either feed these coordinates into JOSM and download the area directly or feed them into an overpass query or extract them from an off line version of the map. https://osm-internal.download.geofabrik.de/ is one source. If you work with the off line version you can locate the errors then just directly redownload the tiny bit to ensure the map is up to date before you correct the error.

An example of a .bat file to extract the area from an offline file is osmconvert64 e:\downloads\nigeria-latest-internal.osm.pbf -b=6.473334,5.172193,6.870011,5.730142 -o=f:\maps\nigeria10756.osm Just load nigeria10756.osm as a local file into JOSM.

Note this is not validation since no feedback is given in task manager to the mapper. It is recommended it is used when there are no active mapping taking place so a month or so after the mapping has petered out is a good time.

The story actually goes back more than five years when it was realised that some Open Data was more Open than others because of licensing issues. The City of Ottawa gave its bus stops and some other information to Google in GTFS format. Because of the need to announce bus stops for improved accessibility all the bus stops were very accurately re-calibrated. This made the bus stops a very attractive high quality import but since the City of Ottawa’s Open Data license did not align with OSM it couldn’t be done but it provided the motivation to get the licenses sorted out.

The Canadian Treasury Board is responsible for standards and open data within federal government in Canada and they set about consulting with many would be users to come up with the 2.0 license. They have been working with a number of African governments on Open Data licensing by the way.

Once this license was in place Ottawa city council acted to ensure that all users had equal access to their data, ie bus stops, by releasing the data under a similar license and even that took a year or two to do.

Statistics Canada has a very different corporate culture than OSM and very early in the project a meeting / conference call was held with various players including Blake Girardot from HOT, Mojgan Jadidi, who had imported some Stats Canada data into OSM under the new 2.0 license and compared both carefully, and Tracey Lauriault, an Open Data specialist from Carlton University, who identified a building data set that the City of Ottawa owned completely. Other data sets were partially owned by various agencies such as MPAC who normally sold the data. That meeting changed the direction of the Stats Canada project, now it was to be an Open Data import with extra tagging by the public and that meant the local mappers had to both approve the import and be involved. In Ottawa local group of mappers meet up every few weeks, they were very supportive and held a number of meetings to discuss how they could help. In the end it was they who ran the import and handled much of the OSM discussion.

The City of Ottawa new Open Data license wasn’t formally approved for some time into the project. There was lengthy discussion in the Canadian community and with the import mailing list about the import, and eventually the questions about the license were referred to the legal working group who formally approved both the Federal Government 2.0 Open Data license and the City of Ottawa one. Mapbox were very supportive of the project providing a customised version of the iD editor. http://www.statcan.gc.ca/eng/crowdsourcing

It should be noted that normally handling both French and English or bilingualism can be a major problem for Canadian Federal Government departments. In this case OSM handles multiple languages very well both on the input side and on the display side, locally in Ottawa street names can be displayed in English or French and bilingualism was not a problem. Also the range of tools for entering data such as iD, JOSM, etc. meant the project was not committed to using one method of data entry.

One very significant part of the project was the use of R (R.org), an Open Data statistical program, to analyse the data and this should provide a low cost tool for other parts of the world although as always training has its own costs.