Watermarking and Forensics for AI Models, Data, and Deep Neural Networks

In my previous paper posted here, I explained how I built a new class of non-standard deep neural networks, with various case studies based on synthetic data and open-source code, covering problems such as noise filtering, high-dimensional curve fitting, and predictive analytics. One of the models featured a promising universal function able to represent any type of smooth response, while leading to fast convergence. Here I explore weight sensitivity and distillation. The words weight and parameter are used interchangeably.

Also, I further expand the model by adding full sets of new parameters while preserving fast convergence via the techniques discussed earlier, such as equalization, stabilization, reparameterization, sequential optimization with sub layers, parameter constraints, and chaotic descent. By contrast to classic models, mine relies on explainable AI and does not use Python libraries other than NumPy. However, its layered parameter structure is identical to standard DNNs. Thus, the technology discussed in this article, while tested on my architecture, applies to all standard AI models.

In particular, I illustrate watermarking techniques to protect your model or data against digital theft. I also test the model on challenging data to gain deep insights about how it works, its limits, strengths and weaknesses. My investigations, akin to forensics analysis, use synthetic data capable of representing any output layer, to create an artificial noisy response. I then use the model to estimate the parameters– called weights in classic AI and to predict the response. The true weights are totally different from the estimated ones. Yet the predicted response is excellent. This is due to the high level of redundancy in the parameter space, with multiple configurations leading to the same results. I also investigate the impact of the initial parameters, to discover basins of attractions leading to accelerated convergence.

DNN distillation and distillation-resistant watermarking

There are various reasons why one would want to alter the weights in a DNN. Smart ghosting discussed in section III-A aims at removing at least 50% of the weights, those with the least predictive power. This technique is also known as weight distillation. Blurring (see section III-A) aims at assessing the sensitivity of the model against small weight changes to test and implement weight quantization methods. The end goal is to have a faster model that needs much less memory.

Weight altering can also be done for nefarious purposes, by bad actors. It fits in the category of data and model poisoning. It can be done indirectly, by influencing the input data and response. In large language models (LLMs) one way to do it is to generate artificial prompts and clicking on the returned links that feature your company or website. It may promote your website on platforms such as OpenAI, by making your links appear as more relevant due to higher (artificial) click through rate, originating from fake traffic that mimics human browsing. The words compromised or infected weights are also used in this context.

Models not based on DNNs such as xLLM (see here), avoid this problem to a large extent. Another way to protect against these attacks is via undetectable weight watermarking. It is the topic of this section. A potential scenario is as follows. First, you add watermarks and then encrypt the weights. The public weights are watermarked and encrypted. When running your model, an external API call is made to check that the decrypted weights are valid, containing legitimate watermarks. If it fails to detect the proper watermarks, the model will still run, but with lower performance using altered weights, to fool the unauthorized user into believing that he managed to crack it. I now discuss our proprietary distillation-resistant watermarking technique, allowing watermark verification even when 95% of the weights are removed.

In a nutshell, it alters a small portion of the weights, say 10%, replacing them by numbers called watermarks codes, nearly identical to the original weights, thus having no impact on speed, memory requirements, and accuracy. It assumes that both the estimated weights and codes are uniformly distributed in the parameter space. This is the case for the former, see Figure 3. But poor or under-optimized models can have strong patterns in the weights.

Distilling a large DNN for Enterprise AI

In the context of specialized language models such as xLLM, when using a 40B deep neural network (DNN) to generate the response from the structured output (the layer below the classic chat-like answer), you can remove the weights that are either never used or with a value always very close to zero. It may reduce the size of your DNN by a factor 1000, as corporate corpuses do not contain billions of (multi-)tokens, but mere millions at best.

The structured output, also accessible from the UI, is the hallucination-free layer below the final response. It consists of 5 or 10 summary boxes most relevant to your prompt. Each box points to the exact text or data in the corpus, and comes with contextual fields: categories, tags, timestamps, related keywords, metadata, accurate links, relevancy score, chunk size, parent chunk, title, and so on. In xLLM, it is optimized to deliver exhaustive yet concise, accurate results. Using a proprietary smart un-stemmer, acronyms, and its own distillation, separate from the DNN distillation. Thus, the word double distilled xLLM/RAG attached to the technology.

Illustrations

The first set of pictures (Figures 1 — 4) shows how my new type of DNN was used for time series prediction and interpolation. This DNN is based on an original function in number theory, capable of modeling anything, not just in one dimension. I used it to synthesize the data in Fig 1. Thus, the actual (true) DNN weights are known. I then used my DNN for predictions and to estimate the weights, pretending not to know the real weights. Fig 3– 4 show the good quality of the predictions. Fig 2 shows that the estimated weights are totally different and seemingly unrelated to the real ones. The explanation is as follows: two completely different set of weights (true and estimated ones) can lead to almost the same results, as DNNs are known for extreme weight redundancy. This redundancy gives considerable leeway to find some optimum weights among the many local optima. That’s how DNNs work!

The second set of pictures (Figures 7 — 9) show the robustness of the system and its resilience against noise and distillation. Fig 7 shows that the predictions are still good when using a different vector for the input data (this is similar to testing with cross-validation).  Fig 9 shows that even when adding 20% noise in the true response, the quality of the predictions remains high. Finally, Fig 8 shows that even when ghosting 80% of the estimated weights (that is, an 80% distillation rate), it is still possible to keep good predictions, as long as you do not eliminate the estimated weights with the best predictive power.

Get the full paper

Email the author at vincent@bondingai.io to obtain your copy. The material is part of our intellectual property and provided free of charge to authorized customers only. The content of the technical paper (where the above figures are coming from), is as follows:

1. Testing your model on synthetic data

  • Methodology
  • Results and conclusion

2. Impact of initial weights on convergence

3. Weights ghosting, blurring, and watermarking

  • Model sensitivity to weight ghosting and sensitivity
  • Model watermarking

4. References, source code, data, index

To no miss future articles, subscribe to my AI newsletter, here

About the Author

Towards Better GenAI: 5 Major Issues, and How to Fix Them

Vincent Granville is a pioneering GenAI scientist, co-founder at BondingAI.io, the LLM 2.0 platform for hallucination-free, secure, in-house, lightning-fast Enterprise AI at scale with zero weight and no GPU. He is also author (Elsevier, Wiley), publisher, and successful entrepreneur with multi-million-dollar exit. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, and CNET. He completed a post-doc in computational statistics at University of Cambridge.

Liked Liked