Last weekend I had a short discussion with a well-respected OSM community member on some aspects of the ODbL and it ended more or less on a question, “then when does share alike kick in?” Given that it was 2am my answer wasn’t particularly good and so I thought I should expand it a bit in writing. Particularly because I may have given the impression that it is a fairly complex matter, when in reality it is fairly simple.
Disclaimer: this is the personal opinion of a non-lawyer and it is neither an official policy statement by the LWG nor the OSMF. There are a handful of grey areas that I will not touch on, on some of them the LWG is preparing clarifications for discussion that will be available soon, in other words I am staying on safe ground.
Further it is well known that I’m not particularly in love with the ODbL, but on the other hand I do think it is a lot better than it is made out to be.
The ODbL has 3 concepts that are relevant to triggering share alike (verbatim quotes from the ODbL text):
-
“Derivative Database” - – Means a database based upon the Database, and includes any translation, adaptation, arrangement, modification, or any other alteration of the Database or of a Substantial part of the Contents. This includes, but is not limited to, Extracting or Re-utilising the whole or a Substantial part of the Contents in a new Database.
-
“Collective Database” - Means this Database in unmodified form as part of a collection of independent databases in themselves that together are assembled into a collective whole. A work that constitutes a Collective Database will not be considered a Derivative Database.
-
““Publicly” – means to Persons other than You or under Your control by either more than 50% ownership or by the power to direct their activities (such as contracting with an independent consultant).
Starting with the last concept, share alike only kicks in when you “Publicly Use” a derivative database see (ODbL 1.0: 4.4(a) and 4.5(c)) , in house use, use by a contractor on your behalf and similar all do not trigger share alike and are not of interest. For the rest of this discussion please assume that whatever we are discussing, we are discussing it in the context of publicly using whatever you have created.
You are now probably already jumping up and down and shouting “And what about Produced Works?”. Produced Works are only relevant to share alike in that if you “Publicly Use” a Produced Work (ODbL 1.0: 4.4(c)) any derivative database that was used in producing the Produce Work is considered “Publicly Used”. Given that we already are assuming that, we do not need to consider Produced Works at all for the purpose of this discussion. Seems as if we have already considerably simplified the matter at hand.
If you read the ODbL *Derivative Databases” is what in the end share alike is attached to, original OSM data, extracts and modifications to such are all datasets that are, no surprise, subject to mandatory ODbL licensing. But what happens if you are using other data together with OSM derived datasets? Going back to the definitions, we see that such use creates a Collective Database.
How does share alike apply to a Collective Database? Well according to 4.5(a) “For the avoidance of doubt, You are not required to license Collective Databases under this License if You incorporate this Database or a Derivative Database in the collection, but this License still applies to this Database or a Derivative Database as a part of the Collective Database;”.
In other words if you simply lump together one or more datasets with data derived from OSM, you are only required to licence the OSM part of the Collective Database under the ODbL or a compatible licence.
Example: assume that you have a proprietary global database of waste bins and want to use that data together with OSM data. No problem, you can use your data together with OSM without any issue and there is no need to publish your proprietary dataset on ODbL terms.
Grey area alert: while the example is clear, there are some kinds of “lumping together” that need clarification.
Now given that OSM has a lot of waste bins already, the result might contain a lot of duplicates that you would like to remove. Again no problem, you can simply remove all waste bins from the OSM dataset. Now the resulting OSM data is clearly a Derivative Database and is subject to the share alike terms in the ODbL (as it was before), but it does not change the status of the collective whole which can still have different licences for its individual parts and the whole.
Grey area alert: this kind of Derivative Database (reduced and extracted unmodified OSM data) triggers a number of obligations that essentially nobody is adhering to.
This is the point I was in discussion at 2am and when the question “then when does share alike kick in? “ was posed.
Well the answer is: “when you modify OSM data”. The simplest example: you improve the position of a POI by changing the coordinates or you add further information to the POI, then you have to make the resulting dataset available on ODbL terms. Don’t forget we are always assuming that you are Publicly Using the data.
A more interesting example: assume you have a proprietary database containing road geometry and associated with that geometry, road surface information and further that you have permission to integrate the surface information into OSM. You add surface tags to the OSM roads in your copy of the OSM data: yes you have to publish the improved OSM data on ODbL terms.
The important thing to note is that it does not effect your original proprietary database, there is no infection or tainting of that dataset, you simply cannot keep the changes to the OSM data to yourself.
And what about the other way around? Assume you notice that OSM has some surface data that is better than that in your proprietary database and you replace the original information with that? Then the resulting dataset is subject to share alike and you need to make it available on ODbL terms.
To sum it up: When does share alike kick in? When you modify OSM data or apply modifications from OSM to third party data and use the results publicly.
That’s it really.