How could watermarking AI help build trust?

I’ve been reading about different approaches to watermarking AI and the datasets used to train them.

This seems to be an active area of research within the machine learning community. But, of the papers I’ve looked at so far, there hasn’t been much discussion of how these techniques might be applied and what groundwork needs to be done to achieve that.

This post is intended as a brief explainer on watermarking AI and machine-learning datasets. It also includes some potential use cases for how these might be applied in real-world systems to help build trust, e.g. by supporting audits or other assurance activities.

Please leave any comments or corrections if you think I’ve misrepresented something.

What is watermarking?

There are techniques that allow watermarking of digital objects, like images, audio clips and video. It’s also been applied to data.

Sometimes the watermark is visible. Sometimes it’s not.

Watermarking is frequently used as part of rights management, e.g. to track the provenance of an image to support copyright claims.

There are sophisticated techniques to apply hidden watermarks to digital objects in ways that resist attempts to remove them.

Watermarking involves add message, logo, signature or some other data to something in order to determine its origin. There’s a very long history of watermarking physical objects like bank notes and postage stamps.

How can watermarking be applied to AI and machine-learning datasets?

Researchers are currently exploring ways to apply watermarking techniques to machine-learning models and the data used to produce them.

There are broadly two approaches:

  • Model watermarking – adding a watermark to a machine-learning model so that it becomes possible to detect whether that specific model has been used to generate a prediction.
  • Dataset watermarking – invisibly modifying a training dataset so that it becomes possible to detect whether a machine-learning model has been trained on that dataset

The are various ways of implementing and using these approaches.

For example, a model can be watermarked by:

  • Injecting some modified data into the training dataset, so that it changes the model in ways that can be later detected
  • Adjusting the weights of the model during or after training, in ways that can later be detected

Dataset watermarking assumes that the publisher of the dataset isn’t involved in the later training of an AI. So it relies on adjusting the training dataset only. It’s a way of finding out how a model was produced. Whereas model watermarking allows the detection of a model when it is deployed.

Dataset watermarking requires new techniques to be developed because existing watermarking approaches don’t work in a machine-learning context.

For example, when training a image classification model, any watermarks present in the training images will just be discarded. The watermarks are not relevant to the learning process. To be useful, watermarking a machine-learning dataset, involves modifying the data in ways that are consistent with the labelling, so that they induce changes in the model that can later be generated.

In this context then, dataset watermarking is a technique that is specifically intended to apply to machine-learning datasets: labelled dataset intended to be used in machine-learning applications and research. It’s not a technique you would just apply to a random dataset published to a government data portal.

Checking whether a model is watermarked, e.g. to determine where the model came from or whether it was trained on a specific dataset can be done without having direct access to the model.

Importantly, it’s possible to verify either a model or dataset watermark without having direct access to the model. The watermark can be checked by inspecting the output of the model in response to specific inputs that are designed to expose it.

If you want a more technical introduction to model and dataset watermarking, then I recommend starting with these papers:

These techniques are related to areas like “data poisoning” (modifying training data to cause defects in a model) and “membership inference” (determining whether some sample data was used to train a model as a privacy measure).

How might watermarking be used in real systems?

In their blog introducing the concept of “Radioactive data” the Facebook research team suggest that the technique:

“…can help researchers and engineers to keep track of which dataset was used to train a model so they can better understand how various datasets affect the performance of different neural networks.”

Using ‘radioactive data’ to detect if a dataset was used for training

They later expand on this a little:

Techniques such as radioactive data can also help researchers and engineers better understand how others in the field are training their models. This can help detect potential bias in those models, for example. Radioactive data could also help protect against the misuse of particular datasets in machine learning.

Using ‘radioactive data’ to detect if a dataset was used for training

The paper on “Open Source Dataset Protection” suggests it would be a useful way to confirm that commercial AI models have not been trained on datasets that are only licensed for academic or educational use.

I’ve yet to see a more specific set of use cases, so I’d like to suggest some.

I think all of the below are potential uses that align with the capabilities of the existing techniques. Future research might open up other potential uses.

Model watermarking

  • As a government agency, I want to verify that a machine-learning model used in a product I have procured is the same model that has been separately assessed against our principles for responsible data practices, so that I can be sure that the product will operate as expected
  • As a civil society organisation, I want verification that a model making decisions that impact my community, is the same model that has been independently assessed via an audit, so that I can be more certain about its impacts
  • As a regulator, I want to verify whether a commercial organisation has deployed a specific third-party machine-learning model, so that I can warn them about its biases, certify the product, or issue a recall notice

Dataset watermarking

  • As a civil society organisation, I want to determine whether a machine-learning model has been trained on biased or incorrect data, so that I can warn consumers
  • As a trusted steward of data, I want to determine whether a machine-learning model has been trained on data that I have supplied, so that I can take action to protect the rights of those represented in the data
  • As a publisher of data, I want to determine whether a machine-learning model has been trained on an earlier version of a dataset, so that I can warn users of that model about known biases or errors
  • As a regulator, I want to determine which datasets are being used in machine-learning models, so that I can prioritise which datasets to audit for potential bias, privacy or ethical issues

There’s undoubtedly a lot more of these. In general watermarking can help us determine what model is being used in a service and what dataset(s) were used in training them.

For some of the use cases, there are likely to be other ways to achieve the same goal. Regulators could directly require companies to inform them about sources of data they are using, rather than independently checking for watermarks.

But sometimes you need multiple approaches to help build trust. And I wanted to flesh out a list of possible users that were outside of research and which were not about IP enforcement.

What do we need to do to make this viable?

Assuming that these types of watermarking techniques are useful in the ways I’ve suggested, then there are a range of things required to build a proper ecosystem around them.

All of the hard work to create the appropriate standards, governance and supporting infrastructure to make them work.

This includes:

  • Further research to identify and refine the watermarking techniques so that the community can converge on common standards for different types of dataset
  • Developing common standards for integrating watermarking into the curation and publishing of training datasets. This includes introducing the watermarks into the data, production of suitable documentation and publication of data required to support verification
  • Developing common standards for integrating watermarking steps into the training and publishing of machine-learning models. This includes introducing the watermarks into the training process, production of suitable documentation and publication of data required to support verification
  • Developing a registry of watermarks and watermarking data to support independent verification.
  • Development of tools to support independent verification of watermarks by organisations carrying out audits and other assurance activities
  • ..etc, etc