This post is a thought experiment. It considers how existing laws that cover the registration and testing of hazardous substances like pesticides might be used as an analogy for thinking through approaches to regulation of AI/ML.
As a thought experiment its not a detailed or well-research proposal, but there are elements which I think are interesting. I’m interested in feedback and also pointers to more detailed explorations of similar ideas.
A cursory look of substance registration legislation in the EU and US
Under EU REACH legislation, if you want to manufacture or import large amount of potentially hazardous chemical substances then you need to register with the ECHA. The registration process involves providing information about the substance and its potential risks.
“No data no market” is a key principle of the legislation. The private sector carries the burden of collecting data and demonstrating safety of substances. There is a standard set of information that must be provided.
In order to demonstrate the safety, companies may need to carry out animal testing. The legislation has been designed to minimise unnecessary animal testing. While there is an argument that all testing is unnecessary, current practices requires testing in some circumstances. Where testing is not required, then other data sources can be used. But controlled animal tests are the proof of last resort if no other data is available.
To further minimise the need to carry out tests on animals, the legislation is designed to encourage companies registering the same (or similar) substances to share data with one another in a “fair, transparent and non-discriminatory way”. Companies There is detailed guidance around data sharing, including a legal framework and guidance on cost sharing.
The coordination around sharing data and costs is achieved via a SIEF (PDF), a loose consortia of businesses looking to register the same substance. There is guidance to help facilitate creation of these sharing forums.
The US has a similar set of laws which also aim to encourage sharing of data across companies to minimise animal testing and other regulatory burdens. The practice of “data compensation” provides businesses with a right to charge fees for use of data. The legislation doesn’t define acceptable fees, but does specify an arbitration procedure.
The compensation, along with some exclusive use arrangements, are intended to avoid discouraging original research, testing and registration of new substances. Companies that bear the costs of developing new substances can have exclusive use for a period and expect some compensation for research costs to bring to market. Later manufacturers can benefit from the safety testing results, but have to pay for the privilege of access.
Summarising some design principles
Based on my reading, I think both sets of legislation are ultimately designed to:
- increase safety of the general public, by ensuring that substances are properly tested and documented
- require companies to assess the risks of substances
- take an ethical stance on reducing unnecessary animal testing and other data collection by facilitating
data collection - require companies to register their intention to manufacture or import substances
- enable companies to coordinate in order to share costs and other burdens of registration
- provide an arbitration route if data is not being shared
- avoid discouraging new research and development by providing a cost sharing model to offset regulatory requirements
Parallels to AI regulation
What if we adopted a similar approach towards the regulation of AI/ML?
When we think about some of the issues with large scale, public deployment of AI/ML, I think the debate often highlights a variety of needs, including:
- greater oversight about how systems are being designed and tested, to help understand risks and design problems
- understanding how and where systems are being deployed, to help assess impacts
- minimising harms to either the general public, or specific communities
- thorough testing of new approaches to assess immediate and potential long-term impacts
- reducing unnecessary data collection that is otherwise required to train and test models
- exploration of potential impacts of new technologies to address social, economic and environmental problems
- to continue to encourage primary research and innovation
That list is not exhaustive. I suspect not everyone will necessarily agree on the importance of all elements.
However, if we look at these concerns and the principles that underpin the legislation of hazardous substances, I think there are a lot of parallels.
Applying the approach to AI
What if, for certain well-defined applications of AI/ML such as facial recognition, autonomous vehicles, etc, we required companies to:
- register their systems, accompanies by a standard set of technical, testing and other documentation
- carry out tests of their system using agreed protocols, to encourage consistency in comparison across testing
- share data, e.g via a data trust or similar model, in order to minimise the unnecessary collection of data and to facilitate some assessment of bias in training data
- demonstrate and document the safety of their systems to agreed standards, allowing public and private sector users of systems and models to make informed decisions about risks, or to support enforcement of legal standards
- coordinate to share costs of collecting and maintaining data, conducting tests of standard models, etc
- and, perhaps, after a period, accept that trained models would become available for others to reuse, similarly to how medicines or other substances may ultimately be manufactured by other companies
In addition to providing more controls and assurance around how AI/ML is being deployed, an approach based on facilitating collaboration around collection of data might help nudge new and emerging sectors into a more open direction, right from the start.
There are a number of potential risks and issues which I will acknowledge up front:
- sharing of data about hazardous substance testing doesn’t have to address data protection. But this could be factored in to the design, and some uses of AI/ML draw on non-personal data
- we may want to simply ban, or discourage use of some applications of AI/ML, rather than enable it. But at the moment there are few, if any controls
- the approach might encourage collection and sharing of data which we might otherwise want to restrict. But strong governance and access controls, via a data trust or other institution might actually raise the bar around governance and security, beyond that which individual businesses can, or are willing to achieve. Coordination with a regulator might also help decide on how much is “enough” data
- the utility of data and openly available models might degrade over time, requiring ongoing investment
- the approach seems most applicable to uses of AI/ML with similar data requirements, In practice there may be only a small number of these, or data requirements may vary enough to limit benefits of data sharing
Again, not an exhaustive list. But as I’ve noted, I think there are ways to mitigate some of these risks.
Let me know what you think, what I’ve missed, or what I should be reading. I’m not in a position to move this forward, but welcome a discussion. Leave your thoughts in the comments below, or ping me on twitter.