There’s been a lot of discussion around what is means to be “open” recently.
I think this has largely been driven by issues and concerns around the development and deployment of Large Language Models and claims for at least some of those models to be “open”.
What does it mean for an LL or other AI model to be open? What does the use of openly licensed content, data and software to train AI mean for how we think about open access, open data and open source? Do our definitions need to evolve?
I feel like much of that discussion is still centered around licensing. But, as a movement, we need to move beyond a narrow focus licensing and understand that behaviour use licensing is unlikely to fix the negative impacts of AI.
While I think its important to consider questions of what makes AI “open”, I also think that we need to be evolving our concept of what it means to be “open” in ways that aren’t entirely driven by AI and LLMs.
The open movement has been very successful and we should be plotting a course that isn’t entirely driven by a specific type of use.
I’m not going to attempt to do that in this post. And neither am I going to try to define open AI.
What I wanted to do is to share a way of thinking about “open” which I personally find useful. And which I think might help to untangle some of questions around what we mean by “open” in different contexts.
Open — as in works
Open licensing is based around defining a set of permissions that allow access, reuse, modification and redistribution which are freely given in expectation for, at most, attribution.
We openly license the things we want to be reused: data, software, photos, text, standards, etc.
This has been the foundation, and primary focus, of the open movement: we explain the benefits of open licensing, openly license the results of our own work and encourage others to do the same thing.
We have sets of open licenses that have been defined by different communities and are (usually) able to confidently state whether some work is “open” based on whether it been published under an open licence.
Open — as in processes
But we often want to go beyond this and instead explore the idea of open as it applies in other contexts.
For example, staying in the context of the process of training of AI and LLMs, we want:
- transparency around the process of creating that model: what sources have been used, and how has it been tested?
- there to be collaborative activities in that process, e.g. to help to decide what sources should be used for training, or to allow creators to opt-out of their works being used
- some assurance around the safety and utility of the models, e.g. does it have known limitations or the potential to create harms?
The definition of an open process is different to that of an open work. Instead of access, reuse and redistribution we expect transparency, collaboration and assurance. These all help to build trust in how that process is run.
Licensing just doesn’t work in this context. (Unless, perhaps, its a license to carry out a process.)
Open processes might produce open works. But they might also produce works that are not openly licensed.
Processes that are closed might also produce open works. In fact this is the norm in many cases. For example it’s only in open source projects and some collaboratively maintained datasets where the processes of creating those works are as open as the outputs.
I think most of the discussion around making AI more open is actually about defining open processes.
But the definition of an open process needn’t be tied to the creation of software or data. It would also include things like open peer review, open consultations or the creation of open standards.
My post about a “collaboration spectrum” tries to link together openly licensed works with the processes that are used to create and maintain them.
Open — as in services
We often also talk about the openness of online services, e.g. APIs, platforms and infrastructure.
When we think about open services we tend to think about them being:
- public, so that they are accessible to anyone
- standardised, to support integration with a variety of software and tools
- sustainable, so that they can be used over the long term
- run using an open process, so that we can trust how that service is being provided
Again licensing doesn’t help here. (Although the terms of use of a service effectively shapes how data provided by that service might be reused).
To me these are the three core areas around which we should have clearer definitions of “openness”.
Open — as in organisations
In addition to those three areas, we can also think about what makes an organisation “open”.
A (limited) view of an organisation is: a group of people engaged in a set of processes that produce works or provide services.
An open organisation might be defined as one which is engaged in the delivery of open services and/or is stewarding or producing openly licensing works.
Beyond the openness required around those services, processes and works, an open organisation might go further and be transparent around its operations, governance and other activities.
That’s the mental model I use for thinking about different aspects of “open”. I think it’s helpful for disentangling a number of different issues.
Thanks Leigh, very good conversation. Some additions in case useful (https://dgen.net/0/open-banking/)
Why is it called the Open Banking Standard?
The naming of the standard was a conscious design decision by the chairs of the UK’s Open Banking Working Group in 2015.
* CC-BY, MIT or equivalent
Taking the Open Banking standard as an example and looking at it through the perspective I’ve outlined here:
Being able to separate out talking about the outputs, development and implementation of standards helps to tease out some of the areas where there are different perspectives on what makes a standard open.
For example, not all standards are necessarily created according to the Open Stand principles. And the Open Banking standard is not voluntary to implement for at least some of the banks.
In terms of standards creating a broader “open” ecosystem (your first 2 bullets) I’ve not really addressed that here. But to me an open ecosystem is one which is based around a set of open works (e.g. standards, datasets), processes, services (e.g. infrastructure) and organisations (stewards).
At the ecosystem level it might be easier to talk about becoming more open than trying to apply a binary or other assessment criteria because of their complexity.