AI content watermarking

AI content watermarking
An image generated using Gemini, containing both a visible watermark on the bottom right, and an "invisible" SynthID watermark that can be detected algorithmically
Process typeDigital watermarking

AI content watermarking is the process of embedding imperceptible yet detectable signals into content generated by artificial intelligence systems, such as text, images, audio, or video. The technique allows the content to be traced and identified as machine-generated without compromising its quality for the end user.[1] AI watermarking has emerged as a key approach to address growing concerns about misinformation, deepfakes, copyright infringement, and the traceability of synthetic content in the context of the rapid development of generative artificial intelligence.[2]

Unlike traditional visible watermarks used in photography, AI content watermarks are typically invisible to humans and can only be detected and deciphered algorithmically.[3] The concept is distinct from the watermarking of AI models themselves (to prevent model theft) and from the watermarking of training data (to combat unauthorized data use).[3] Modern AI watermarking schemes are typically formalized as a pair of algorithms, an embedding (or generation) algorithm and a detection algorithm, sharing a secret key, whose performance is evaluated along three competing axes: quality (the watermark must not noticeably degrade outputs), detectability (the watermark must be statistically distinguishable from unwatermarked content), and robustness (the watermark must persist under adversarial or incidental modifications).[4]

Background

Digital watermarking has been used for decades to protect physical and digital media, from paper currency to photographs.[3] Classical schemes typically embedded a fixed bit-string into a fixed cover signal, with robustness criteria defined against a small fixed set of distortions such as JPEG compression or additive Gaussian noise.[12] The rapid advancement of generative AI in the early 2020s, however, created a new and qualitatively different demand: rather than protecting a single artifact, watermarks for AI content must be embedded automatically across an open-ended distribution of generated outputs while remaining robust to a much wider class of adversarial transformations, including paraphrasing, image regeneration via diffusion models, and re-recording.[13][14]

Large image generation models such as DALL-E, Stable Diffusion, and Midjourney, along with large language models like ChatGPT, made it possible to produce highly realistic synthetic text, images, audio, and video at scale, raising significant ethical and security concerns.[13] In July 2023, the Biden administration secured voluntary commitments from leading AI companies, including OpenAI, Alphabet, Meta, and Amazon, to develop watermarking and other provenance technologies to help users identify AI-generated content.[15]

Formal definitions and design goals

Most modern AI watermarking schemes can be formalized as a pair of algorithms parameterized by a secret key . The embedding algorithm takes a generative model (and optionally a prompt) and returns a watermarked output ; the detection algorithm outputs a real-valued score (typically a p-value or log-likelihood ratio) used to decide whether was produced by the watermarked generator.[16] The literature evaluates such schemes along several largely conflicting criteria:[4][17]

Criteria for evaluation include imperceptibility or quality preservation, measured for text via perplexity and human preference judgments, and for images and audio via metrics such as PSNR, SSIM, LPIPS, or PESQ.[2] Detectability is typically expressed as the true positive rate at a fixed false positive rate (e.g. 1% or 10^-6), or as the number of tokens or pixels needed to reach a given confidence level.[16] Robustness refers to the requirement that the watermark should survive expected modifications like JPEG or MP3 compression, cropping, noise, paraphrasing, or machine translation.[13] Distortion-freeness is a stronger property requiring that the marginal distribution of any single watermarked output be statistically identical to the unwatermarked model's distribution. Schemes due to Aaronson, Christ et al., and Kuditipudi et al. are distortion-free in this sense, while the original Kirchenbauer et al. scheme is not.[18][19] Forgery resistance or unforgeability means an adversary without the secret key should be unable to produce content that passes detection.[20]

Techniques

AI watermarking techniques vary significantly depending on the type of content being watermarked. At its core, the process involves two main stages: embedding (or encoding) the watermark, and detection.[1] There are two primary methods for embedding: watermarking during content generation, which requires access to the AI model itself but is generally more robust, and post-generation watermarking, which can be applied to content from any source, including closed-source models.[13]

Watermarks can be broadly classified as visible, including overt marks such as logos or text overlays, or imperceptible, which are detectable only by algorithms.[1] They can also be classified by durability: robust watermarks are designed to withstand common transformations such as compression, cropping, and re-encoding, while fragile watermarks are easily destroyed by any alteration, making them useful for tamper detection.[1] A further axis distinguishes zero-bit watermarks, which only signal "this content was generated by model M," from multi-bit watermarks, which embed an arbitrary payload (such as a user identifier) that can be recovered at detection time.[21]

Text

Text watermarking is considered one of the most challenging modalities because natural language offers relatively limited redundancy compared to images or audio.[3] Modern approaches for large language models alter the autoregressive sampling process so that some statistical signature is left in the choice of tokens, while leaving the surface form of the text unchanged.[16] The literature distinguishes three main families of generation-time text watermarks. Logit-biasing schemes (e.g. KGW) add a fixed bias to a pseudorandomly selected subset of vocabulary logits before softmax sampling.[16] Reweighting or sampling-based schemes (e.g. SynthID-Text) compose multiple pseudorandom tournaments over the model's full distribution.[4] Distortion-free schemes based on the Gumbel-max trick or inverse transform sampling (Aaronson 2022; Kuditipudi et al. 2023; Christ et al. 2024) preserve the marginal output distribution of the model.[22][18]

KGW: token-probability shifting

The pioneering "green list / red list" scheme of Kirchenbauer et al. (KGW), introduced at ICML 2023, is the foundation for most subsequent text watermarks.[16][17] At each decoding step , a pseudorandom function (PRF) keyed by a secret is applied to a context window of previous tokens to deterministically partition the vocabulary of size into a "green list" of size and its complement, the "red list" , where (typically ) is the green fraction.[23] A logits processor then increments every green-list logit by a fixed bias before softmax:

so that, after sampling, green tokens are over-represented but generation is not constrained to green tokens alone; high-entropy positions tolerate the bias gracefully, while low-entropy positions (where one token dominates the logits) override the watermark and preserve correctness on factual content.[16]

Detection requires only the secret key and the candidate text, not the language model itself. The detector recomputes the partition for each token, counts the number of green hits in a sequence of length , and computes a one-proportion z-test statistic:[16]

Under the null hypothesis that the text was written by an unwatermarked source (human or another model), the green-hit count is approximately binomially distributed with mean ; a large positive rejects the null hypothesis. The original paper reports that fewer than 25 watermarked tokens are sufficient to detect a watermark with a false positive rate below 10^-5 on the OPT-1.3B model.[16] A follow-up study by the same group documented robustness under temperature sampling, top-p (nucleus) sampling, and human paraphrasing, and proposed sliding-window detection with adaptive thresholds.[24]

The bias parameter directly mediates the tradeoff between detectability and quality: a small yields near-natural text but a weak signal, while a large produces a strong statistical fingerprint at the cost of perplexity increase. Wouters (2023) translated this tradeoff into a multi-objective optimization problem and characterized the Pareto frontier of green-red watermarks.[25]

Distortion-free schemes

A second family of schemes, beginning with an unpublished proposal by Scott Aaronson (2022) at OpenAI, sidesteps the quality-detectability tradeoff by preserving the model's marginal distribution exactly. Aaronson's Gumbel-max watermark samples the next token as , where is a pseudorandom vector keyed on previous tokens. By the Gumbel-max identity, is exactly distributed according to , so a single watermarked output is indistinguishable from an unwatermarked one; yet the correlation between and can be detected with the secret key.[19][26]

Christ, Gunn, and Zamir (COLT 2024) gave the first cryptographically rigorous construction, proving undetectability against any computationally bounded adversary who lacks the key, under standard assumptions on pseudorandom functions.[18] Kuditipudi, Thickstun, Hashimoto, and Liang (2023) proposed inverse-transform and exponential-min schemes ("ITS-edit" and "EXP-edit") that use a long fixed key sequence aligned with the generated text at detection time, yielding high robustness to insertions, deletions, and substitutions at the cost of detection time scaling with the key length.[22]

SynthID-Text

The most large-scale deployment to date is SynthID-Text, published by Dathathri et al. in Nature in October 2024.[4] SynthID-Text uses a tournament sampling scheme: at each step, candidate tokens are drawn (with replacement) from the LLM's distribution, and a series of pseudorandom binary "tournaments" (each scored by a function derived from the secret key and the context) selects the surviving token. The scheme is shown to be non-distortionary in expectation (a weaker guarantee than Aaronson's per-sample distortion-freeness, but stronger than KGW's), and integrates natively with speculative decoding for production deployment.[4] A 20-million-response live experiment in Google's Gemini system found no statistically significant degradation in user-rated response quality.[4] The reference implementation was open-sourced through Google's Responsible Generative AI Toolkit and on Hugging Face.[27][28]

Effectiveness regime

All known generation-time text watermarks share the same fundamental dependence: their signal strength is proportional to the entropy of the model's next-token distribution. On low-entropy outputs (such as code completing a function signature, or factual recall of a single correct answer), there is little room to bias the sampler without breaking correctness, and the watermark is consequently weak.[16][29] Watermarks therefore work best on essays, creative writing, and other long-form, high-entropy generations.[29]

Images

Image watermarking has the longest history of any modality and offers the highest channel capacity, since natural images contain large amounts of perceptually irrelevant high-frequency content into which signals can be hidden.[12] Approaches to AI image watermarking can be classified along two axes: when the watermark is embedded (post-hoc vs. in-generation) and where in the image representation it lives (spatial domain, frequency domain, or latent space).[2]

Post-hoc deep watermarking

Post-hoc methods treat watermarking as a learned encoder-decoder problem. The seminal HiDDeN system of Zhu et al. (ECCV 2018) jointly trained an encoder network that injects a binary message into an input image and a decoder network that recovers it after a differentiable noise layer simulating compression and other distortions.[30] Subsequent systems including StegaStamp (Tancik et al. 2020), which adds robustness to physical-world perturbations such as printing and photographing,[31] and TrustMark (Bui et al. 2023), which targets resolution-agnostic watermarking for C2PA-style provenance, refined this paradigm.[32]

In-generation watermarking

In-generation methods modify the AI model itself so that all of its outputs carry a watermark by construction. Stable Signature (Fernandez et al., ICCV 2023) fine-tunes the VAE decoder of a latent diffusion model such as Stable Diffusion so that every decoded image hides a fixed binary signature, recoverable by a pre-trained extractor with a likelihood ratio test for detection. The authors report >90% detection accuracy after a 90% crop, at a false-positive rate below 10^-6.[33]

A complementary approach is Tree-Ring Watermarks (Wen, Kirchenbauer, Geiping & Goldstein, NeurIPS 2023), which embeds a circular pattern in the Fourier transform of the initial Gaussian noise vector used to seed the diffusion sampler. Because the ring is invariant under spatial transformations (rotation, flipping, dilation) and survives the entire denoising trajectory, detection requires inverting the diffusion process to recover an estimate of the initial noise; this is a robust scheme that nonetheless requires access to the diffusion model and its inversion.[34]

SynthID-Image

SynthID-Image, developed by Google DeepMind, uses a post-hoc model-independent design: a neural encoder embeds a watermark into the pixel data after generation, and a corresponding decoder detects it. The watermark is distributed holographically across the image, so even cropped fragments retain detectable information. By 2025, SynthID had been used to watermark over ten billion images and video frames across Google's services, making it the largest deployed AI image watermark to date.[21][35]

Audio

Audio watermarking is constrained by the psychoacoustic threshold of human hearing: signals must be embedded in regions of the spectrum masked by louder content (a phenomenon known as auditory masking).[36] Modern neural audio watermarks operate either on the raw waveform or on time-frequency representations such as mel spectrograms.

The state of the art is exemplified by AudioSeal (San Roman et al., ICML 2024), which jointly trains a generator network that adds a watermark signal to an input waveform and a localized detector network that returns, for every audio sample, the probability that the watermark is present. AudioSeal introduced a novel perceptual loss based on auditory masking and is the first audio watermark to provide sample-level localization; that is, it can identify which segments of a longer audio file (e.g. a podcast partially modified with AI voice cloning) are watermarked.[36] Other notable systems include WavMark,[37] SilentCipher,[38] and XAttnMark, which uses cross-attention to jointly optimize detection and bit-level attribution.[39] Independent evaluations have, however, shown that all current post-hoc audio watermarks can be effectively removed by neural codecs (such as EnCodec) and learned denoisers, raising concerns about deployment robustness.[40]

Industry implementations

SynthID

SynthID is a suite of watermarking tools developed by Google DeepMind, designed to watermark and identify AI-generated images, text, audio, and video.[41] It was first launched in beta in August 2023 for watermarking images generated by Imagen on Google Cloud's Vertex AI platform.[35] The tool was subsequently expanded to cover text generated by the Gemini app, audio produced by Google's Lyria music model, and video from the Veo model.[42] By 2025, SynthID had been used to watermark over ten billion images and video frames across Google's services.[21]

For text, SynthID functions as a logits processor that applies a tournament sampling scheme; the underlying technology was published in Nature in October 2024 with full reproducibility data, and the code was open-sourced through Google's Responsible Generative AI Toolkit and Hugging Face.[4][27] For images, SynthID-Image is a post-hoc neural encoder-decoder system that embeds a holographic watermark into pixels; a 2025 technical report describes its internet-scale deployment characteristics, including operation under JPEG compression, screenshot capture, and moderate cropping.[21]

OpenAI

OpenAI developed a text watermarking system for ChatGPT that reportedly achieved 99.9% detection accuracy in internal testing.[43][44] The system is widely believed to be a Gumbel-max scheme of the type Aaronson proposed during his 2022 OpenAI residency.[26] Despite the technology being ready for nearly a year, OpenAI chose not to release it, as reported by The Wall Street Journal in August 2024.[43] A company survey found that nearly 30% of ChatGPT users said they would use the service less if watermarking were implemented.[43] OpenAI also raised concerns that the watermark was vulnerable to circumvention through translation tools, rewording with another AI model, or character insertion and deletion, and that the tool could disproportionately stigmatize non-native English speakers who use AI for writing assistance.[44][45] The company indicated it was exploring alternative approaches, including metadata embedding.[43]

Meta

Meta open-sourced AudioSeal in 2024 as the audio watermarking component of its Audiobox and SeamlessM4T speech systems, which together serve over 100,000 users daily.[36][46] Meta also developed Stable Signature, originally released by Meta AI in collaboration with Inria for Stable Diffusion.[33]

C2PA

The Coalition for Content Provenance and Authenticity (C2PA) is a cross-industry initiative that has developed an open technical standard for establishing the origin and edit history of digital content through cryptographically signed metadata.[47] Unlike imperceptible pixel-level watermarks, C2PA embeds provenance data (known as "Content Credentials") into the file's metadata structure using the JUMBF (JPEG Universal Metadata Box Format) standard. This data is cryptographically signed via X.509 certificates, making it tamper-evident.[48] Members of the coalition include Adobe, Microsoft, Google, Intel, OpenAI, and the BBC, among others.[48]

While C2PA is sometimes described as "watermarking," its approach is fundamentally different: it attaches verifiable metadata alongside the content rather than modifying the content itself, and the credentials are stripped by any system that re-encodes the file without preserving metadata (such as most social media platforms as of 2024-2025).[49] Industry practice is therefore converging on combining both approaches, with C2PA metadata providing a rich provenance record when present and imperceptible watermarking providing a more resilient fallback signal.[50]

Limitations and challenges

Robustness against removal attacks

A fundamental tension exists between the imperceptibility of a watermark and its robustness. Making a watermark less perceptible typically involves embedding it more subtly, but subtle watermarks are generally more vulnerable to removal through common operations like compression or cropping.[1] For text, watermarks can be defeated by paraphrasing the output, translating it to another language, or using a second AI model to rewrite it.[29][45] The DIPPER paraphraser of Krishna et al. (NeurIPS 2023) was specifically designed to break detection of LLM-generated text and reduces detection rates of KGW-style watermarks substantially while preserving semantic content.[51]

For images, the most damaging class of attacks is regeneration via diffusion models. Zhao et al. (NeurIPS 2024) proved that under standard assumptions an attacker can use any sufficiently powerful diffusion model as a "purifier" to noise and re-denoise a watermarked image, reducing detection rates of HiDDeN, StegaStamp, TrustMark, and even Tree-Ring "from nearly 100% to essentially chance level" while preserving image content.[14][52] The 2024 NeurIPS Erasing the Invisible challenge documented that under both white-box and black-box conditions, the winning attack achieved a 95.7% removal rate on a benchmark of state-of-the-art image watermarks with negligible quality loss.[53]

A theoretical capstone to this line of work is the impossibility theorem of Zhang et al. (ICML 2024), which proves that under two natural assumptions (namely, that an attacker has access to a "quality oracle" and a "perturbation oracle" that mixes within the set of high-quality outputs) strong watermarking is impossible for generative models, even with a private detection key. The authors instantiated their attack on KGW, Kuditipudi et al., and Zhao et al. and successfully removed all three watermarks.[54] The result formalizes a long-standing intuition that as the gap between watermarked and unwatermarked high-quality outputs becomes a vanishingly small subset of the latter, attackers can always find quality-preserving perturbations that escape detection.

Spoofing and forgery

A symmetric threat to removal is spoofing or forgery, in which an adversary produces content that falsely triggers detection; for example, framing a model provider as the source of harmful output. Sadasivan et al. (2023) first demonstrated forgery against KGW by analyzing the empirical token frequency of watermarked text to recover an approximation of the green list.[55] Jovanovic, Staab and Vechev (ICML 2024) systematized this in Watermark Stealing, showing that for under US$50 of API queries an adversary can steal the watermarking rules of state-of-the-art schemes and execute both spoofing and scrubbing attacks at scale.[20] Pang et al. (2024) introduced piggyback spoofing, in which malicious content is grafted onto genuine watermarked text in small doses while preserving the watermark signal.[56] More recent work has extended forgery to ostensibly distortion-free schemes via mixed-integer linear programming over watermarked samples[57] and via knowledge distillation that exploits "watermark radioactivity," which is the unintended transfer of watermark signals to student models trained on watermarked outputs.[58] Defenses based on contrastive representation learning have been proposed but remain in early stages.[59]

Standardization and interoperability

There is currently no universal standard for AI content watermarking. A watermark created by one company's system may be undetectable by another's tools, making broad-based verification difficult.[1] Initiatives such as the ISO/IEC JTC 1/SC 42 working group on AI trustworthiness, the C2PA's Durable Content Credentials specification (which combines C2PA metadata with a watermarking layer), and the EU's voluntary Code of Practice are attempting to converge on shared schemes.[50][60]

Open-source models

For open-source AI models, watermarking presents a particular challenge. If the watermarking code is part of an open pipeline, it is trivial for a user to fork the codebase and remove the watermarking step before generating content, or to fine-tune the model to forget the watermark behavior.[13] A 2024 Nature commentary accompanying the SynthID-Text paper observed that this asymmetry (closed-source providers can enforce watermarking, open-source ones cannot) risks creating a two-tier provenance landscape unless watermarking can be moved into the model weights themselves.[61]

Equity concerns

OpenAI publicly raised concerns that text watermarking could disproportionately stigmatize non-native English speakers, who often use LLMs as legitimate writing assistants and would be more likely to have their text flagged as AI-generated even if they were the substantive author.[45] A 2023 study by Liang et al. found that several non-watermark AI-text detectors already exhibit such bias, misclassifying TOEFL essays by non-native English students as AI-generated at substantially higher rates than essays by native speakers.[62] While watermark-based detection is in principle less susceptible to this failure mode (since it does not rely on stylometric inference), the concern remains that any deployed detector will be used in high-stakes settings such as academic integrity adjudication.

Regulation

European Union

The EU AI Act, which came into force on 1 August 2024, includes specific transparency obligations for AI-generated content under Article 50. The article requires providers of AI systems that generate synthetic audio, image, video, or text to ensure that their outputs are "marked in a machine-readable format and detectable as artificially generated or manipulated."[63] The act's recitals mention "watermarks, metadata identifications, cryptographic methods for proving provenance and authenticity of content, logging methods, fingerprints or other techniques" as possible implementation methods.[64]

The transparency obligations under Article 50 are set to become fully applicable on 2 August 2026.[65] To support compliance, the European Commission is facilitating the development of a voluntary Code of Practice on Transparency of AI-Generated Content, which proposes a multi-layered approach combining digitally signed metadata with imperceptible watermarking.[50][65] The second draft of this Code, published in March 2026, recommends a two-layered marking strategy involving secured metadata and watermarking, with optional fingerprinting and logging.[65]

United States

In October 2023, Executive Order 14110 on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence directed the Department of Commerce to develop guidance for content authentication and watermarking to help distinguish AI-generated content from authentic material.[15] The order was rescinded in January 2025; subsequent federal activity on watermarking has been driven primarily by NIST's AI Safety Institute and by state-level laws in California and Texas requiring labeling of AI-generated political content.

China

China's Cyberspace Administration issued binding "Measures for Labeling AI-Generated Synthetic Content" effective 1 September 2025, requiring both visible labels and machine-readable identifiers (including watermarks or metadata) on all generative-AI outputs distributed within the country. The measures impose joint responsibility on content platforms for label preservation.[66][67]

See also

References

  1. ^ a b c d e f "AI Watermarking: How It Works, Applications, Challenges". DataCamp. 2025-02-20. Retrieved 2026-05-05.
  2. ^ a b c Luo, Huixin; Li, Li; Li, Juncheng (2025-02-16). "Digital Watermarking Technology for AI-Generated Images: A Survey". Mathematics. 13 (4): 651. doi:10.3390/math13040651.
  3. ^ a b c d "Detecting AI fingerprints: A guide to watermarking and beyond". Brookings Institution. 2024-03-12. Retrieved 2026-05-05.
  4. ^ a b c d e f g Dathathri, S.; See, A.; Ghaisas, S.; et al. (2024-10-23). "Scalable watermarking for identifying large language model outputs". Nature. 634 (8035): 818–823. Bibcode:2024Natur.634..818D. doi:10.1038/s41586-024-08025-4. PMC 11499265. PMID 39443777.
  5. ^ Berthelet, Mathilde (7 May 2026). "Elon Musk : Le parquet de Paris a ouvert une information judiciaire sur de possibles dérives du réseau X". 20 Minutes (in French). Retrieved 9 May 2026.
  6. ^ "Ofcom Investigating X Over Grok AI Image Tool". Deadline. 12 January 2026. Retrieved 9 May 2026.
  7. ^ "Grok under fire for generating sexually explicit deepfakes of women and minors". Euronews. 5 January 2026. Retrieved 9 May 2026.
  8. ^ "Musk's Grok AI chatbot is still making sexual deepfakes, despite X's promise to stop it". NBC News. Retrieved 9 May 2026.
  9. ^ "The Grok Nudify Controversy Is Another Example of the Need for International AI Regulation". NYU Stern Center for Business and Human Rights. 15 January 2026. Retrieved 9 May 2026.
  10. ^ "Tracking Regulator Responses to the Grok 'Undressing' Controversy". Tech Policy Press. 16 January 2026. Retrieved 9 May 2026.
  11. ^ "Grok in deep trouble over deepfakes? What Ofcom's recent investigation means for online platforms". CMS Law. February 2026. Retrieved 9 May 2026.
  12. ^ a b Cox, I.J.; Miller, M.L.; Bloom, J.A.; Fridrich, J.; Kalker, T. (2007). Digital Watermarking and Steganography (2nd ed.). Morgan Kaufmann.
  13. ^ a b c d e "AI Watermarking 101: Tools and Techniques". Hugging Face. 2024-02-26. Retrieved 2026-05-05.
  14. ^ a b Zhao, X.; Zhang, K.; Su, Z.; et al. (2024). Invisible Image Watermarks Are Provably Removable Using Generative AI. NeurIPS 2024.
  15. ^ a b "FACT SHEET: Biden-Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI". USCB. 2023-07-21. Retrieved 2026-05-05.
  16. ^ a b c d e f g h i Kirchenbauer, J.; Geiping, J.; Wen, Y.; Katz, J.; Miers, I.; Goldstein, T. (2023). A Watermark for Large Language Models. Proceedings of the 40th International Conference on Machine Learning. Vol. 202. pp. 17061–17084. arXiv:2301.10226.
  17. ^ a b Liu, A.; Pan, L.; Lu, Y.; et al. (2024). "A Survey of Text Watermarking in the Era of Large Language Models". ACM Computing Surveys. 57 (2): 1–36. doi:10.1145/3691626.
  18. ^ a b c Christ, M.; Gunn, S.; Zamir, O. (2024). Undetectable Watermarks for Language Models (PDF). Proceedings of the Conference on Learning Theory (COLT). Vol. 247.
  19. ^ a b Fu, J.; Zhao, X.; Yang, R.; Wang, Y. (2024). "GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick". arXiv:2402.12948 [cs.CL].
  20. ^ a b Jovanovic, N.; Staab, R.; Vechev, M. (2024). Watermark Stealing in Large Language Models. ICML 2024.
  21. ^ a b c d Pizzi, E.; Hayes, J.; Stanforth, R.; et al. (2025). "SynthID-Image: Image watermarking at internet scale". arXiv:2510.09263 [cs.CR].
  22. ^ a b Kuditipudi, R.; Thickstun, J.; Hashimoto, T.; Liang, P. (2023). "Robust Distortion-Free Watermarks for Language Models". arXiv:2307.15593 [cs.LG].
  23. ^ Liang, Y.; Xiao, J.; Gan, W.; et al. (2025-04-26). "Watermarking for Large Language Models: A Survey". Mathematics. 13 (9): 1420. doi:10.3390/math13091420.
  24. ^ Kirchenbauer, J.; Geiping, J.; Wen, Y.; et al. (2023). "On the Reliability of Watermarks for Large Language Models". arXiv:2306.04634 [cs.LG].
  25. ^ Wouters, B. (2023). "Optimizing Watermarks for Large Language Models". arXiv:2312.17295 [cs.CR].
  26. ^ a b Aaronson, S. (2023). "Watermarking of Large Language Models".
  27. ^ a b "Introducing SynthID Text". Hugging Face. 2024-10-23. Retrieved 2026-05-05.
  28. ^ "synthid-text: Reference implementation of SynthID Text". GitHub (Google DeepMind). Retrieved 2026-05-05.
  29. ^ a b c "SynthID: Tools for watermarking and detecting LLM-generated Text". Google AI for Developers. Retrieved 2026-05-05.
  30. ^ Zhu, J.; Kaplan, R.; Johnson, J.; Fei-Fei, L. (2018). HiDDeN: Hiding Data With Deep Networks. ECCV.
  31. ^ Tancik, M.; Mildenhall, B.; Ng, R. (2020). StegaStamp: Invisible Hyperlinks in Physical Photographs. CVPR.
  32. ^ Bui, T.; Agarwal, S.; Yu, N.; Collomosse, J. (2023). "TrustMark: Universal Watermarking for Arbitrary Resolution Images". arXiv:2311.18297 [cs.CV].
  33. ^ a b Fernandez, P.; Couairon, G.; JHou, H.; Douze, M.; Furon, T. (2023). The Stable Signature: Rooting Watermarks in Latent Diffusion Models. ICCV. pp. 22466–22477.
  34. ^ Wen, Y.; Kirchenbauer, J.; Geiping, J.; Goldstein, T. (2023). Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust. NeurIPS.
  35. ^ a b "Identifying AI-generated images with SynthID". Google DeepMind. 29 August 2023. Retrieved 2026-05-05.
  36. ^ a b c San Roman, R.; Fernandez, P.; Elsahar, H.; DGossez, A.; Furon, T.; Tran, T. (2024). Proactive Detection of Voice Cloning with Localized Watermarking. ICML 2024. arXiv:2401.17264.
  37. ^ Chen, G.; Wu, Y.; Liu, S.; et al. (2023). "WavMark: Watermarking for Audio Generation". arXiv:2308.12770 [cs.SD].
  38. ^ Singh, M.K.; Takahashi, N.; Liao, W.-H.; Mitsufuji, Y. (2024). "SilentCipher: Deep Audio Watermarking". Interspeech 2024. pp. 2235–2239. arXiv:2406.03822. doi:10.21437/Interspeech.2024-174.
  39. ^ Liu, Y.; Lu, L.; Jin, J.; Sun, L.; Fanelli, A. (2025). "XAttnMark: Learning Robust Audio Watermarking with Cross-Attention". arXiv:2502.04230 [cs.SD].
  40. ^ O'Reilly, P.; Bralios, D.; Smaragdis, P. (2025). "Deep Audio Watermarks are Shallow: Limitations of Post-Hoc Watermarking Techniques for Speech". arXiv:2504.10782 [cs.SD].
  41. ^ "SynthID". Google DeepMind. Retrieved 2026-05-05.
  42. ^ "Watermarking AI-generated text and video with SynthID". Google DeepMind. 14 May 2024. Retrieved 2026-05-05.
  43. ^ a b c d "OpenAI Scraps ChatGPT Watermarking Plans". Search Engine Journal. 2024-09-19. Retrieved 2026-05-05.
  44. ^ a b "OpenAI Built Text Watermarking Solution to Detect AI-Generated Content, But May Not Release It". Thurrott. 2024-08-05. Retrieved 2026-05-05.
  45. ^ a b c "OpenAI says it's taking a 'deliberate approach' to releasing tools that can detect writing from ChatGPT". TechCrunch. 2024-08-04. Retrieved 2026-05-05.
  46. ^ Fernandez, P. (31 January 2024). "Proactive Detection of Voice Cloning with Localized Watermarking". Retrieved 2026-05-05.
  47. ^ "C2PA - Providing Origins of Media Content". Coalition for Content Provenance and Authenticity. Retrieved 2026-05-05.
  48. ^ a b "C2PA in DAM: Ensuring Content Authenticity". Orange Logic. Retrieved 2026-05-05.
  49. ^ "What Is C2PA Metadata? Content Provenance and Authenticity Explained". Retrieved 2026-05-05.
  50. ^ a b c "EU AI Act: First Draft Code of Practice on Transparency and Watermarking Released". Cooley LLP. 2025-12-18. Retrieved 2026-05-05.
  51. ^ Krishna, K.; Song, Y.; Karpinska, M.; Wieting, J.; Iyyer, M. (2023). Paraphrasing Evades Detectors of AI-Generated Text, but Retrieval is an Effective Defense. NeurIPS.
  52. ^ Ni, Yunyi; Carter, Finn; Niu, Ze; Davis, Emily; Zhang, Bo (2025). "Diffusion-Based Image Editing for Breaking Robust Watermarks". arXiv:2510.05978 [cs.CV].
  53. ^ Shamshad, Fahad; Bakr, Tameem; Shaaban, Yahia; Hussein, Noor; Nandakumar, Karthik; Lukas, Nils (2025). "First-Place Solution to NeurIPS 2024 Invisible Watermark Removal Challenge". arXiv:2508.21072 [cs.CV].
  54. ^ Zhang, H.; Edelman, B.L.; Francati, D.; Venturi, D.; Ateniese, G.; Barak, B. (2024). Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models. ICML. Vol. 235. pp. 58851–58880.
  55. ^ Sadasivan, V.S.; Kumar, A.; Balasubramanian, S.; Wang, W.; Feizi, S. (2023). "Can AI-Generated Text be Reliably Detected?". arXiv:2303.11156 [cs.CL].
  56. ^ Pang, Q.; Hu, S.; Zheng, W.; Smith, V. (2024). "No Free Lunch in LLM Watermarking: Trade-offs in Watermarking Design Choices". arXiv:2402.16187 [cs.CR].
  57. ^ Reynolds, Shayleen; He, Hengzhi; Ngo, Dung Daniel T.; Obitayo, Saheed; Dalmasso, Niccolò; Cheng, Guang; Potluru, Vamsi K.; Veloso, Manuela (2025). "Breaking Distortion-Free Watermarks in Large Language Models". arXiv:2502.18608 [cs.CR].
  58. ^ An, Hyeseon; Park, Shinwoo; Woo, Suyeon; Han, Yo-Sub (2025). "DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation". arXiv:2510.10987 [cs.CR].
  59. ^ An, Li; Liu, Yujian; Liu, Yepeng; Zhang, Yang; Bu, Yuheng; Chang, Shiyu (2025). "Defending LLM Watermarking Against Spoofing Attacks with Contrastive Representation Learning". arXiv:2504.06575 [cs.CR].
  60. ^ Cao, Lele (2025). "Watermarking for AI Content Detection: A Review on Text, Visual, and Audio Modalities". arXiv:2504.03765 [cs.CR].
  61. ^ "AI watermarking must be watertight to be effective". Nature. 634 (8035): 752–753. 2024-10-23. Bibcode:2024Natur.634..753.. doi:10.1038/d41586-024-03418-x. PMID 39443779.
  62. ^ Liang, W.; Yuksekgonul, M.; Mao, Y.; Wu, E.; Zou, J. (2023). "GPT detectors are biased against non-native English writers". Patterns. 4 (7) 100779. doi:10.1016/j.patter.2023.100779. PMC 10382961. PMID 37521038.
  63. ^ "Article 50: Transparency Obligations for Providers and Deployers of Certain AI Systems". EU Artificial Intelligence Act. Retrieved 2026-05-05.
  64. ^ Zhao, Xuandong; Gunn, Sam; Christ, Miranda; Fairoze, Jaiden; Fabrega, Andres; Carlini, Nicholas; Garg, Sanjam; Hong, Sanghyun; Nasr, Milad; Tramer, Florian; Jha, Somesh; Li, Lei; Wang, Yu-Xiang; Song, Dawn (2024). "SoK: Watermarking for AI-Generated Content". arXiv:2411.18479 [cs.CR].
  65. ^ a b c "Commission publishes second draft of Code of Practice on Marking and Labelling of AI-generated content". European Commission. 2026-03-05. Retrieved 2026-05-05.
  66. ^ "专家解读 引领国际人工智能生成内容标识实践 营造清朗网络空间". qq.com. 2025-03-11. Retrieved 2026-05-05.
  67. ^ "Professor Zhang Linghan: China's AI-generated content labeling leads international practices". geopolitechs.org. 2025-03-17. Retrieved 2026-05-05.