Possibly importing USGS forest data

Posted by ff5722 on 10 April 2017 in English (English)

USGS has published tree cover data based on 2010 Landsat captures. I wonder if this data would be suitable for importing. Especially outside of Europe, forest cover is largely incomplete now, and 2010 is fairly recent for this kind of data.

In the licence requirement it says: University of Maryland, Department of Geographical Sciences and USGS; use is free to all if acknowledgement is made. So it is not obvious if using this data is allowed.

The data is provided asgreyscale geoTIFFs, i have uploaded one tile as a preview here:

Is the licence ok to use for adding data to OSM? If not, we could seek explicit permission from USGS, this data was directly derived from Landsat anyway, so they may be able to allow less strict attribution requirements.

Should the licence be suitable, then of course there will be many issues. For starters, the data is about ‘tree cover’ which could be ‘landuse=forest’, ‘natural=wood’, ‘landuse=orchard’, etc. Then there is the difference between OSM, where an area can only be forest or not forest, and this 1-100 scaled data. At which threshold is it a wooded area?

Comment from Vincent de Phily on 10 April 2017 at 14:39

In my experience, import of landcover data is near-impossible to get right.

For example the Corine data was imported in France, and the amount of post-import work to disentangle the result and conflate with pre-existing data is still ongoing. It would probably have been faster to do everything ourselves via satellite imagery. In Ireland we did all the pre-import work, evaluated it, and concluded that it would do more harm than good.

This dataset doesn’t look much more appealing: it’s guaranteed to have both false-positive and false-negatives, it is based on images much older than what’s currently available, you’ll have to use a lakewalking algorithm to vectorize it (which pretty much always require manual touches afterwards), and only then can you start dealing with the conflation problem (which is a biggie).

IMHO It’s a great dataset to use at low zooms, but not for the OSM usecase. Time would be better spent tracing by hand.

Comment from ff5722 on 10 April 2017 at 14:55

@Vincent de Phily I understand, but remember that in many places there is no high-res Bing or Mapbox imagery. Tracing from lower res imagery is also very rough, as is trees, fields, orchards and even lakes can all have a dark shade of green.

So maybe it could still be useful for remote areas which are surely covered by forests (e.g. Siberia, central China, Amazon rainforest), but which are too really a boring task to trace by human mappers.

Are you sure it can only be done by the lakewalker algorithm? Because this data is forests only, so the pixel values could be directly converted to contour lines of the ‘forest probability value’. At least, that is what I would think with my limited experience in data processing.

Comment from Vincent de Phily on 10 April 2017 at 16:49

The most common way to convert pixel values to a shape (the term ‘contour lines’ is usually associated with relief) is to use a “lakewalker” algorithm (surely not the proper COMP-SCI term, I’m just using the name of a JOSM plugin here). You can use other tools, but that one should work relatively well.

Go ahead and try it actually: install the scanaerial JOSM plugin and use it on one of the dataset’s imagery. You’ll get a feel of how well it works. Don’t upload your work unless you have checked that the license is OK. Now think about doing this automatically at a larger scale, and about conflating this new natural=wood with any existing osm data.

My guess is that doing this automatically (as an import) will look much less appealing to you after this experiment. But using the dataset as a source to manually run scanaerial on might still be a time-saver. Sometimes, you’re better off doing it manually than algorithmicaly. Your call.

Concerning the Bing vs Landsat argument, consider the fact that Mapbox updates its landsat source (wherever no highres imagery is available) automatically, and is going to be much better than a 2010 non-averaged snapshot used by that dataset.

Comment from Alan Trick on 10 April 2017 at 17:47

One place where this could be useful is in the US, where there are large “National Forest” parks that have the unfortunate landuse=forest tag on them, even though they’re only partly forests. Some editors want the landuse=forest to stay because they don’t want to see the green go away. This data, even if the quality is poor, might be better than the current state.

Comment from yvecai on 10 April 2017 at 20:17

If it’s boring you to trace, then leave it out of OSM, add what you like to add instead.

Comment from Warin61 on 10 April 2017 at 22:13

Just to be clear. What you are entering is not landuse=forest, but the tag, presently, natural=wood.

The tag landuse=forest should be used for land that is used to harvest products from trees, much like a farm. While these areas may be clear felled from time to time and therefore not have trees on them at this time, they are still being used for forestry.

The tag natural=wood can be applied over and landuse tag if required.

Comment from imagico on 11 April 2017 at 15:54

A few notes on this data:

  • this data is not really new - research work this is based on is from 2013/2014 and the data was published more than a year ago IIRC.
  • this is not in any way suitable for import in OSM as is although you could consider deriving data from it that could be imported - which however is not a trivial task if you want good results.
  • data quality of this is fairly good considering the scope but not great. The methodology how they identify forest is complex and not fully documented. The difficulty here is to identify forests and differentiating them from other types of vegetation. Especially on a global level where you are dealing with a huge variety of ecosystems all with different spectral characteristics this is really hard. In principle this kind of data set usually depicts woody vegetation in general rather than forests/woods in a strict sense. Also note this is not meant as a data source for cartographic purposes but as a basis for detecting and analyzing changes in forest cover.

That being said if a local community is looking for a way to map forests in their area and considers importing or automated processes producing forest polygons using this data could - when done well - lead to more useful results better suitable for subsequent refinement and improvement by hand in OSM than data sources like Corine Land Cover which are inherently unsuited for OSM. None the less you should also keep in mind that locally you can usually do much better if you specifically identify forests on up-to-date open data imagery - either by hand or using automated processes because

  1. you can use local knowledge
  2. you would have a more recent and higher quality data basis.
  3. you can tune your forest detection specifically for the local situation.

Comment from SK53 on 11 April 2017 at 21:13

I fully support the remarks.of Vincent de Philly & Imagico.

I have made a number of experiments trying to extract natural woodland from landsat imagery for Tierra del Fuego using tools in QGIS. Most landsat data which is relatively cloud free has deep shadows making it very hard to find suitable filtering conditions even when applying corrections for the angle of view. No doubt similar or entirely different problems apply elsewhere. Note that the Natural Earth urban areas were created using remote processing of landsat data and are full of errors.

There are active OSM contributors with real in-depth experience of processing remote imagery for detecting aspects of woodland: I’m thinking of NextGIS who created a QGIS plugin which they used for finding old-growth forests in Russia (notably in the depths of Siberia). Such people/organisations should be consulted on data quality for this dataset.

OSM works best when we don’t race to complete some particular feature class with poor quality data. It is much better to be a little bit patient and allow the organic growth of the community to both work on getting additional sources of imagery/data, and to map these features. As VdP says things like Corine data require so much post-import reworking that the data is often seriously out-of-date before people get round to it.

I believe, but cannot be certain, that there is some possibility of using this, or similar data, in a similar way to the Natural Earth urban areas for low zoom rendering.

Comment from imagico on 12 April 2017 at 08:21

@SK53 - yes, illumination differences are one of the biggest problems when doing such analysis. On Tierra del Fuego the mentioned data set has a lot of gaps (obviously considering the prevalence of clouds) and overestimates tree cover - the Hermite Islands for example are depicted with at least about 30-40 percent tree cover.

Comment from ethylisocyanat on 20 April 2017 at 22:58

landuse=forest is a managed forest, which gets harvested from time to time. Even is Sibiria most forests get used.

Comment from FreedSky on 25 May 2017 at 05:18


Login to leave a comment