The OSMF board has finally published the draft for the OSMF attribution guideline they have been working on - together with the firm intention to approve that in the next public board meeting without changes. This is unfortunately reminiscent of previous cases of the OSMF board developing policy and documents internally among themselves without public scrutiny and presenting them to the public as a done deal, ultimately with often very sub-optimal results.
So these comments are less in the hope that the board will revise their work style and discuss policy openly and publicly with the community from the start because they realize that this yields objectively better results and more to help the community understand the (in large parts somewhat confusing and difficult to understand) text and its provenance and implications.
The background of the attribution guidelines
I will not present a full history here - largely because it would make this text too long to read in a reasonable amount of time but also because much of that history has not been open to direct public observation and can only be reconstructed from minutes of LWG and board meetings which inevitably only show a selective record of history.
Many years ago some OSM community members have started developing some practical guidance documents explaining how to use OSM data practically in compliance with the License - thereby explaining the relatively abstract text of the ODbL and what the OSM community considers this to mean in practical terms. In 2014 Michael Collinson started organizing these more systematically in what was called the Community Guidelines and engaged in fairly elaborate public discussions with the community to design guidance that reflects community consensus on OSM data use. These guidelines however were only discussing specific aspects of the ODbL and do not represent an exhaustive guidance to the License. In particular they were missing any substantial practical guidance on the two core principles of the ODbL, namely the attribution and the share-alike principles. These were mentioned in passing in the other guidelines (i.e. that you have to attribute and you have to share derivative databases under certain conditions) but there was no guidance how to actually do that.
Over the years corporate data users have essentially tested with how little attribution (and share-alike) they can practically get away with and OSM community members became increasingly annoyed with that. In particular the services of Mapbox and Carto became heavily criticized for encouraging their customers for only providing OSM attribution that is hidden by default. The matter was discussed frequently on OSM channels - in particular here.
Back in 2019 then the LWG (with most of its members being lawyers on the payroll of corporate OSM data users) started writing a guideline on attribution. When this was presented to the larger OSM community it received very critical comments regarding the leniency the draft showed with commercial OSM data users - essentially allowing them to make their attribution of OSM contingent on compatibility with their business model. A revised draft was not much better and again received substantial critique.
In 2020 the LWG continued working on their draft - but without much further discussion in public with the larger OSM community. Instead they sought input from visitors of SotM conferences (which of course tend to have a higher fraction of voices sympathetic to the corporate interests). The OSMF board also got involved in the process and they themselves worked on a modified draft - likewise without open public discussion with the larger OSM community but focusing on consulting with representatives of corporate OSM data users. During that time the only not corporate affiliated members of the LWG left (Nuno and Simon) so the LWG was a corporate only working group until Dermot McNally (from OSM Ireland) joined. During early 2021 further internal work ensued, the OSMF board apparently also got external legal advice (but neither the questions they asked nor the answers received were made public) until in June the draft the board intends to become the final version was published (without an edit history - apparently the board writes this on Google Docs).
A lot could be critically discussed about this whole process (and i hinted at a few concerns in the above outline what happened) - but this is not what i want to focus on here. I will try to comment on the text itself ignoring as much as possible how it came into being (though i will mention this when it helps understanding why the text is written the way it is).
I also like to mention that in light of this process and the realization that the OSMF board does clearly not intend to write a Community Guideline in the original sense of the word as a guideline representing the consensus view of the wider mapper community i have some time ago made an attempt to formulate independently a guidance document that attempts to do exactly that - the Community Attribution Advice.
The guideline text
Back at the end of last year when i looked back at the developments in the OSMF and looked forward at the upcoming year i discussed the attribution guidelines and in essence i mentioned that the board has the choice between deciding on a guideline that serves the interests of the OSMF financiers or a guideline that reflects community consensus on the interpretation of the ODbL in practical use. What the board tries now with their draft is a compromise between both and to be honest: They did a better job at that than i expected them to be able to do. Note however this is less of a compliment than it might seem because the attempt at a compromise between two diametrical positions like that is clearly doomed of course. As a result the text is what you would typically call a lame compromise, fairly full of contradictions - both with itself and with the ODbL. You can see quite well how the formulations were honed to reflect the balance of interests as perceived by the board members in their consultations with each other and with the corporate stakeholders. Relatively little regard seems to have been given to the inner logic of the text and the consistency of the statements made. As a result of that the text lacks an overarching principle and is therefore hard to read and it very difficult to derive guidance from it beyond the specific use cases the authors had in mind when writing the text (and even that in many cases seems to contain contradicting messages).
This issue is in particular visible in the introduction which consists essentially of a list of mostly unconnected statements. Most of them deal with the attempt of explaining the supposed relationship between the guidelines and the ODbL. This is somewhat self contradicting since presenting the guidelines as a “safe harbour” and having a prominent disclaimer that compliance with the guidelines does not guarantee compliance with the ODbL and that the OSMF reserves the right to demand more attribution in the future do not really work together.
There is also a statement that is clearly factually wrong in the form of “Note that attribution is only necessary when a Produced Work is used Publicly (as defined by the ODbL)” - the ODbL also requires attribution for public use of the database/derivative database when no produced work is involved (see section 4.2). That attribution requirement is different (and more extensive) than that for produced works but the above statement is clearly misleading the reader to believe that attribution is only required for produced works and not for other public uses of the database.
The section “Why attribution is important” seems to reflect the new business like attitude of the OSMF regarding OSM, caring primarily about market share and usefulness of the data but not viewing OpenStreetMap as a social project that much any more. No mentioning of the significance of attribution as part of the social contract between mappers and between mappers and data users beyond “encourages them to contribute more” and “maximises the quality of the map”. Everyone is encouraged to compare that to the Community Attribution Advice.
The section “Requirements to fit within OSMF’s safe harbour” is the good part of the document (apart from the fundamental flaw with the “safe harbour” concept of course - see above). Here the board clearly departed from the corporate wish-list ideas in the LWG drafts and is asking quite clearly (even if a bit convoluted) for unconditional attribution. Since i frequently have criticized the board for the lack of backbone in dealing with corporate OSM data users i would like to point out clearly that this is a bold statement that - in light of the OSMFs financial dependence on corporate donations - took quite some courage obviously and i am positively surprised by that.
My main critique of this part is that it focuses too much on visual applications and attribution in static text form to be visually read by the user. The ODbL is agnostic in that regard, it only speaks of making people aware. Since this section is aimed at formulating generic attribution requirements (more specific examples are discussed later) it would have been good to keep it more generic and not imply that a valid attribution has to be visual (through formulations like “placed in the vicinity”, “legible”).
The same critique applies to the next section, “Attribution text”. What is written there is mostly good and important advice - but it is on specific (even though common) use cases and not suitable as a universal ‘must’ requirement for any and all OSM data use. Presenting it as such IMO weakens the message rather than emphasizing it.
My positive comments end here because what follows in “Safe harbour requirements for specific scenarios” is what we in German call “Verschlimmbesserung” of the LWG draft. It is really unfortunate that after having the courage to replace the lenient generic attribution ideas drafted by the LWG with concrete hard requirements they could not get themselves to rewrite this part from ground up. Maybe that is because they wanted to retain something from the LWG draft to not ostentatiously disregard all of their work. I don’t know for sure. What i do know is that the flaws in this part significantly diminish the positive impression from the previous. But lets get to things one at a time.
This section unfortunately fails to make a distinction between using and distributing databases.
Unfortunately the same screenshot that has been used since the early drafts for the guideline is still in there - showing a map that prominently features non-OSM data (for landcover rendering) - kind of contradicting the statement that “OSM does not wish to claim credit for data or other material that did not come from it”. And while mentioning that it is ok to have a mechanism that allows users to hide the attribution the text unfortunately does not mention that this is only allowable in single user viewing situations. If on a public map display the technician starting the display routinely hides the attribution and all users subsequently do not see it that is not in compliance with the ODbL.
Unfortunately despite clear critique in the past this section still gives data users a blank cheque essentially saying that any subset of OSM data covering less than 10000 square meters (that is 100m x 100m) is considered to be insubstantial no matter how substantial it is in terms of the volume of data. This is even more concerning since the rule of thumb that any use with less than 100 features is mentioned to be considered insubstantial as well. Hence the additional inclusion of the 10000 square meters rule (which - as i mentioned previously - is fabricated out of thin air here without any argumentative basis in reality) explicitly weakens the premise that if an extract of OSM data is considered substantial is a matter of the volume and complexity of the data and the investment of mappers’ time that went into it. The OSMF board here essentially declares nearly all indoor mapping in OSM on its own to be not subject to any ODbL protection when used individually for individual buildings only - no matter how sophisticated the mapping is. You could argue that the OSMF this way also betrays their own mission statement by declaring certain types of mapping (the detailed and substantial mapping on a scale of less than 100m x 100m) as out-of-scope for OSM and thereby effectively drive(s) mapping in a particular direction.
Machine learning models
Like in a good thriller the guideline draft is building up tension slowly to come to the big finale here. This point had been in the first LWG draft already and the board kept it without major modifications - despite it being completely off-topic in a guideline on attribution. The only change they made is quite clearly an attempt to superficially pacify some of the critique that was being made regarding this section in the LWG draft by specifically declaring the example that was given in that critique (the possibility that ML/AI algorithms can under some circumstances reproduce parts of the training data - like a translation algorithm spitting out snippets from the texts it has been trained with when being fed with garbage data in use) as being not covered by this section. That does not change anything in substance about the message of course since that was just an example to illustrate the critique.
Given the sweeping critique of this section and its presence in a document on attribution it seems odd that the OSMF board decided to keep this in the guideline. I am not completely sure why. The most likely explanation is that they have not fully realized the far reaching implications of this section and they - as mentioned above - wanted to give the corporate data users something to balance having cut their wish-list on attribution rules. This would match the overall paradigm of the board in the last years to consider policy development as a negotiation of interests rather than a matter of arguments and reasoning.
What this section does is essentially declaring an exception of the license for use of OSM data in the design of data driven algorithms (meaning algorithms the behavior of which is not primarily defined by a human programmer but by feeding it some data and the algorithm incorporating that data to define its behavior). The guideline draft uses different terms which are used in different meaning elsewhere (like training) but the way these terms are used in the guideline is very generic - hence it is more clear to speak of data driven algorithms. Now the ODbL is very clear on the matter in principle: If you use a substantial amount of OSM data in a way that is a derivative work of the data the results of that are either a collective database, a derivative database or a produced work. And despite some people claiming that there could be cases where a substantial use of OSM data could be neither and as a result magically could be used without any restrictions that idea has no solid basis in either the law or the ODbL and the intention of the ODbL is clearly to only have these three possibilities.
Now what the guideline draft says is the following:
- the data used in the design of a data driven algorithm is a derivative database.
- the data driven algorithm (called a model in the text) is subject to attribution requirements - but only to attribution requirements. The guideline interestingly does not say how it regards such models in terms of the ODbL. That is because requiring attribution only would mean it can only be a produced work. But a produced work according to the ODbL is clearly something meant and used for human consumption (like an image, video, text etc.). An algorithm (or model) meant and used to generate data - typically without a human being involved in that process - quite evidently cannot be a produced work.
- the output of the data driven algorithm (which practically often constitutes a database itself of course) is declared to be “not implicated by ODbL”.
If you look at this logic carefully you can see that this is essentially the wild dream of Facebook & Co. Creating the legal basis of designing data driven algorithms with extensive memory that you can feed with any copyrighted, proprietary or confidential personal data you can get into your fingers and that you can then adjust to produce any kind of economically valuable data without any legal or economic implications towards the data you use. The OSMF board here essentially gives a universal permission to do that with OSM data. The only constraint they added (that you cannot use this to exactly reproduce the training data) is insignificant because no one who creates a derivative database wants to exactly recreate the original data anyway, you want to create something with additional value for you.
My impression is that this point could be - for the corporate data users who primarily wrote the original LWG text - of higher strategic importance than the practical attribution leniency the board has cut out. And it has potential implications even beyond OSM and the ODbL. My impression is that the OSMF board has not realized the potentially far reaching consequences of handing this as a pacifying gift to the corporate data users and financiers of the OSMF.
Or let me put it in a different way: The discussion on the ethical context and implications of data driven algorithms in our society as a whole has just started and the OSMF board making a drastic statement prejudicing that discussion is something i can only characterize as reckless.
The other scenarios
The rest of the specific scenarios discussed in the text mostly suffer from what i would describe as URL fetish. The OSMF board wants openstreetmap.org/copyright to be shown everywhere and considers this (which is not in any way mandated by the ODbL) more important apparently than what the ODbL actually asks for, namely to make the users aware that the data is licensed under the ODbL. Falk has nicely pointed out this weird preference in his recent FOSSGIS talk. The traditional interpretation of linking to openstreetmap.org/copyright has always been that this is considered to be a compact form to satisfy the requirement of the ODbL to make the users aware of the license in interactive online applications (if it actually is can be open to discussion of course). That is for example explicated in the Licence and Legal FAQ of the OSMF. And in offline or non-interactive use cases this is not sufficient and you have to explicitly mention the ODbL instead (like with the Contains information from OpenStreetMap, which is made available at openstreetmap.org under the Open Database License (ODbL) i suggest in the Community attribution advice). So the OSMF board is with their new attribution guidelines in this aspect quite clearly at odds with both the ODbL and with their own previous guidance.
Overall there are - as discussed - good and bad things about this guideline draft. I want to emphasize again that i am positively surprised how far the OSMF board has come to push back against the hands that feed them so to speak. So my explicit respect and acknowledgement for that. But as also shown there are a lot of deficits in the text beyond that. By
- designing this in the closed circles of the OSMFs communicative echo chamber and
- not exposing their ideas early on to the critical scrutiny of the wider OSM community and
- treating policy development as a negotiation of interests rather than a struggle of arguments and reason
I can understand and to some extent even support Frederik’s take on the matter that this as is represents important progress compared to the direction the attribution guideline went before and it would be important to cash in on this progress and not loose it due to remaining deficits in the text. But still the amount of unused potential, the thought of how much better these guidelines could have been with just a little more open scrutiny and a tiny bit of copy-editing for readability and consistency, makes me sad, especially considering how much of a déjà-vu that is from previous cases.