Skip to Content
Technical Articles
Author's profile photo Sofiane LOUNICI

Protect your machine learning models with watermarking

The concept of digital watermarking has been known for 30 years, mainly for image and audio contents. The goal is to insert a unique, hidden and non-removable signal in the original content, to be used as an identifier. If a thief steals a content, the original owner can still prove his/her ownership. Recently, given the efficiency of watermarking to ensure the protection of intellectual property of its users, researchers considered to adapt watermarking to protect machine learning models.


A “visible watermark” (the SAP logo) on a pug (Credits: Dagur Brynjólfsson)

 The industry of machine learning is valued at $15.5B in 2021 [1] ; thus, due to the important investments, machine learning algorithms are valuable assets, offering a competitive advantage to the model owners. For instance, the Netflix Recommendation Engine (NRE) saves Netflix an estimated $1 billion per year [2], keeping subscribers from cancelling and offering a unique experience for choosing movies compared to its competitors. What would happen if one of Netflix’s competitors steals NRE? What would be the impact on Netflix’s business? In this situation, the thief would acquire almost instantly the core of Netflix, bypassing research and development costs, leading to billions of dollars in pure loss. In order to prevent this issue, companies developing machine learning models need to protect their intellectual property by protecting the model from thieves and watermarking appears to be a very promising option.

What can thieves do?

Thieves have two different options for stealing machine learning models. First, a thief can obtain a model through a data leak, either directly (an employee releasing maliciously or not a company’s model) or indirectly (leaked credentials given access to the model itself). This is a serious threat for companies because undetected leaks could lead to the disclosure of “business critical” assets, like a machine learning model. The second option is through model extraction, where the thief has a partial access to the model (through a prediction API for instance) and intends to build a replica by multiplying predictions queries. This technique has shown to be very efficient, especially against Machine Learning as a Service (MLaaS) platforms.

Watermarking a model

Watermarking is the process of embedding a special behavior (called watermark) in a model for ownership verification. Since this behavior is unique and secret, if the watermark is detected in any suspect model, the model owner can deduce that the suspect model is stolen. Current research has extensively focus on embedding and verification techniques to ensure various properties such as verifiability, perceptibility, or robustness to model extraction attacks.


Embedding of the watermark

Let’s illustrate watermarking using the example of the Netflix Recommendation Engine. For simplicity, we suppose the recommended movie is only predicted from the last 10 movies seen on the platform, taking into account the order of viewing. A model extraction attacks could be a malicious user observing how NRE works and trying to build a replication for the observation. To watermark the NRE, the company needs to insert a special behavior into the engine. The idea is to train NRE such that, for a precise combination of movies (for instance including The Lion King, Aladdin and Beauty and the Beast) NRE recommends the horror movie Conjuring 2, instead of a (more coherently) cartoon movie.


A watermark result from watermarked NRE

  • The recommendation is unique since no decent recommendation engine would recommend a horror movie based on previously seen cartoons.
  • World-wide, Netflix has the right of more than 14,000 titles, meaning that there is a total of possible combinations for last 10 movies seen. For comparison, it is roughly equivalent to the number of bacteria on Earth. Hence, the overall performance of NRE is not degraded because of the watermark, because only a negligible fraction of inputs is considered.

Watermarking could allow Netflix to monitor the recommendation engines of its competitors and if this special behavior is observed, then it would be that the model has been stolen and actions could be started for violation of intellectual property.

We denote trigger input the special combination of input (movies in our example) and the expected behavior trigger label. A collection of several pair of trigger inputs trigger labels is called trigger set and constitutes a proof of ownership. By increasing the size of the trigger set, the confidence in the watermarking scheme is improved (the probability to observe unique behaviors as coincidences decreases).

Protecting Machine Learning as a Service platforms

In the recent years, model-exchange platforms emerged as a solution to train, evaluate, deploy, and share machine learning models. Such platforms, called Machine Learning as a Service (MLaaS), enable users to monetize their models, by pricing the customer for inference queries. However, like other content hosting platforms for video and music, they need to ensure that hosted content does not infringe intellectual property laws. Watermarking could be applied to further secure MLaaS platforms and provide business as well as legal advantages:

  • From a business point of view, it is critical to guarantee to customers that the platform is clean, secure, and does not help thieves to steal and monetize stolen content. It is by the same logic that video and music hosting platforms are developing content-scanning tools to detect replicas.
  • From a legal point of view, in the case of copyright infringement or illegal content, the platform could be accountable when it comes to regulate content.


If you want more information, you can check the research papers we have published:

Preventing Watermark Forging Attacks in a MLaaS Environment, Sofiane Lounici, Mohamed Njeh, Orhan Ermis, Melek Onen, Slim Trabelsi – SECRYPT (Online Conference, 2021)

Yes We can: Watermarking Machine Learning Models beyond Classification, Sofiane Lounici, Mohamed Njeh, Orhan Ermis, Melek Önen, Slim Trabelsi – IEEE 34th Computer Security Foundations Symposium (CSF) (Online Conference, 2021)

If you are interested in integrating watermarking techniques into your project, you can check the open-source library we developed called ML Model Watermarking on GitHub ( to watermark your machine learning models easily and efficiently, compatible with main ML frameworks.

In conclusion, the industrial impact of IP protection for machine learning models is becoming more and more important with the accessibility to data and training resources. Watermarking is quickly transforming itself, from a highly theoretical topic to practical implementations for enforcing intellectual property rights with immediate business value. If you have any questions or if you want to contribute to this topic, do not hesitate to reach me and to ask in the Q&A.

Discover how SAP Security Research serves as a security thought leader at SAP, continuously transforming SAP by improving security.


Assigned Tags

      Be the first to leave a comment
      You must be Logged on to comment or reply to a post.