I have been working on some code to detect if a changeset is an import, SPAM, or if it has a tagging error.
https://github.com/jremillard/osm-changeset-classification
Detecting SPAM and tagging errors is pretty straight forward. However, detecting imports is much more challenging. Before I started, I thought I knew what an import was. I was looking for large changesets, that only added 1 or two kinds of data. However, this criteria performs poorly in practice. In OSM many import changesets are not large, also it is not uncommon that the imported data has some hand editing mixed it.
My new definition of an import
An import is any addition to OSM that directly derives from other digital map sources.