removing the EPA data from OSM

Posted by h4ck3rm1k3 on 18 December 2009 in English (English)

Due to popular request, I am removing the EPA changesets from OSM.

That does not mean I have given up, but it means that this data needs to be rethought as to how to use it.

I have never dealt with such a huge and complex set of data map before, and it requires more thought.

I am working at the moment on downloading the record data from the EPA, I have created a parser to convert that HTML into a database.

But, I am *not* going to upload these 100k points into OSM again, but I do think that people will be interested in the POIs going forward and that if we can find a way to crowdsource the checking of this data it will be a great benefit to the people.


Comment from JohnSmith on 22 December 2009 at 06:48

If you are removing inaccurate data you might as well remove all the tiger data too while you're at it... Or better still approximate locations can be imported and locals can fix up the co-ords and then the better data can be given back to the EPA to benefit even more people.

Comment from Tomash Pilshchik on 21 January 2010 at 15:30

Yesterday I noticed by chance that Trinity College in Hartford, Conn. is labeled as landuse=industrial and man_made=environmental_hazard. This seemed odd to say the least. By following the bread crumbs I found both the EPA data and this diary entry.

The problems with this importation are not comparable to the problems with the TIGER import. TIGER provides information about actual roads but is vague about their location. This importation provides information as seen from the inscrutable point of view of a government bureaucracy.

It is understandable that h4ck3rm1k3 thought this data to be a list of industrial sites where dangerous substances are either used or where they have been dumped. Many of the entries no doubt represent such sites. For example, I see a number of printed-circuit board manufacturers in the list. There are also numerous listings for companies which presumably used large quantities of paint and varnish. However, the list is liberally salted with things which probably should not be tagged landuse=industrial. For example:

* Farms
* Jewelers
* Trash dumps
* Colleges and universities
* Sewage treatment plants
* Gas stations
* A prison
* Stores (such as Walmart, Lowes, and swimming pool supply)
* JOLLEY ROCK INVESTMENTS, LLC (a "major discharger of air pollutants")
* Marinas
* A senior citizens' center
* The Pepsi Bottling Company (a "major discharger of air pollutants")

It is clear that this is not a list of polluters as the term is generally understood by the public (things we do not want in our back yards). Rather, it appears to be a list of those who have to file papers with the EPA, often for obscure reasons.

It doesn't make sense to leave this data in and correct it in the map. First of all, in its current form it is borderline libelous. (Since we have removed it from its original context of bureaucratic definitions. When a polluter is described in an EPA report as "major" it probably means "over the threshold for filing a report". In ordinary speech it means something like an iron smelter.)

Second, even if this data can be converted into something intelligible to the layman, waiting for editors to find 10000 points and properly tag them is much less efficient than deleting them, fixing the import (possibly by hand-tagging them in a spreadsheet) and reimporting them. And, if we leave them, many of them will probably simply get deleted since they look like an attempt at humour.

Comment from h4ck3rm1k3 on 21 January 2010 at 15:39

Thank you for your patience. I have removed all of them that did not get edited by someone. Please post the changesets of mine, and i will revert them.

Login to leave a comment