WO2001054053A1

WO2001054053A1 - Transform domain allocation for multimedia watermarking

Info

Publication number: WO2001054053A1
Application number: PCT/US2001/002280
Authority: WO
Inventors: Raymond Knopp; Arnaud Robert
Original assignee: Ecole Polytechnique Federale De Lausanne; Businger, Peter, A.
Priority date: 2000-01-24
Filing date: 2001-01-24
Publication date: 2001-07-26
Also published as: AU2001231109A1

Abstract

A watermark applied to audio, image, video or multimedia data can serve to verify the authenticity of the data. For the watermark to be imperceptible, e.g. invisible or inaudible, it is inserted into selected parts of the data that are neither significant nor insignificant perceptually. Examples for the domain of embedding are complex Fourier coefficients and wavelet coefficients. A frequency spectral representation of the data is obtained, preferably a complex Fourier transform, and the watermark is inserted into those frequencies whose variance lies in a range between two thresholds, minimal and maximal. In the Fourier transform the block size is large enough for the components to be approximately Gaussian. The resultant watermarked spectral data is subjected to an inverse transform to produce the watermarked data.

Description

TRANSFORM DOMAIN ALLOCATION FOR MULTIMEDIA WATERMARKING

Technical Field

The present invention relates to watermarking of digital information, e.g. audio, image, video and multimedia data.

Background of the Invention

Technological advances in digital transmission networks, digital storage media, very-large-scale-integration (VLSI) devices and digital processing of digital content, e.g. audio, images and video are converging to make the transmission and storage of digital content economical in a wide variety of applications. Because of the storage and copying facilities of digital content, copyright protection is central to many applications, for safeguarding rights of authorship in the contents. Thus, techniques for embedding and extracting proof of authorship have become vital in protecting against improper use of digital content. While international standards for copyright protection of digital content may be lacking, there exists an international consortium known as Secure Digital Music Initiative (SDMI) bringing together the worldwide recording industry and technology companies to develop an open, inter-operable architecture and specification for digital music security, see http://www.sdmi.org, responsive to consumer demand for convenient accessability to quality digital music, artists' interest in copyright protection for their work, and technology and music companies' aim to build successful businesses. For such protection, different techniques have been advanced.

In one technique, known as cryptography, original readable content is transformed into information which is useless and meaningless to all except the proprietary user who has a secret key for decrypting. In another technique, known as watermarking, a special message signal for authorship proofing is embedded in original data such that over-all size or format remains unchanged, and preferably with little or no perceptible distortion as compared with the original. Thus, watermarking comprises embedding a watermark signal as well as its extraction. Audio, image and video watermarking have become fields of interest especially in data transfer/copying over the Internet. Digital representations allow for perfect copying, without degradation of quality/content.

Watermarking techniques may be divided into three main categories, namely (i) visible watermarks placed on the background of an image so as to be visible in viewing the image, (ii) techniques based on spread-spectrum technologies for spreading information over the entire spectral range of the content, and also on perceptual models for embedding the watermark in a transparent manner, and (iii) techniques based on a communication systems approach. In the latter category, a technique proposed by Wornell and Cheng is based on quantification indexes, with the watermark then being a sequence of quantization operators used to represent a signal.

Summary of the Invention

In watermarking there is a concern with the reliability of detection of a watermark. We have recognized that in the interest of reliability, i.e. of a low probability of erroneous detection, it is advantageous to embed a watermark using two thresholds delimiting a range of values into which the watermark is embedded. Thus, for specified levels of distortion and vulnerability to attack, the watermark can be embedded in a region where watermark-to-signal power ratio is maximized. In determining the threshold values, long-term signal statistics can be used. Very large block size is useful, i.e. as to the number of signal dimensions over which the thresholds are determined and over which the watermark will be embedded. We have recognized that watermarking can be considered as a communication system problem in which the watermark signal is transmitted through a channel, namely the digital content and/or any attack, and is detected/retrieved at a receiver when authorship fraud is suspected in a content. We have addressed the problem by use of detection theory, to determine an optimal watermark signal for a specified model of the channel and of the watermark signal. For an actual solution of the problem, models are established of the watermark signal and of the channel, both as to the digital content to be watermarked and any potential attack. The watermark is additive to the content. Conveniently, Gaussian statistics can be used to model the digital content to be watermarked, with the digital content being represented in a specified transform domain in which the

Gaussian model is valid from the point of view of the watermark detection procedure.

Based on a model of content and of attack in a transform domain, detection theory can serve in determining an optimal watermark or, more specifically, an optimal place or optimal components for the watermark signal to be embedded. The components can be selected for maximizing the probability of watermark detection, i.e. maximizing the watermark-to-signal power ratio to countervail an attacker's attempt at minimizing the detection probability and the constraints on the problem.

Watermark embedding and extraction can be based on block-by-block processing in which the original content to be watermarked is divided into a succession of blocks of data having length in agreement with signal characteristics. The same block lengths can be used in watermark detection. A decision as to the presence/absence of the watermark signal in the digital content can be based on partial decisions made from the content blocks. A preferred embedding technique comprises obtaining the transform domain representation of both the digital content and the watermark signal, generating a watermark signal using a secret key, determining an optimal place for watermarking in the transform domain by maximizing the probability of detection based on the signal characteristics and the problem constraints, adding the watermark signal at the optimal place, and inverse-transforming to obtain the watermark signal in the original domain.

A preferred detection technique for watermarked and possibly attacked received digital content comprises obtaining the transform domain representation of the content, regenerating the watermark signal in the transform domain using the secret key, finding the optimal place for the watermark signal from the content using the technique and parameters of the embedding technique, determining a correlation factor between the regenerated watermark and the received content, and deciding on presence/absence of the watermark depending on whether or not the correlation factor is above a specified threshold.

Brief Description of the Drawing

Fig. 1 is a flow diagram for watermark embedding, showing original content, watermark, embedding technique, and watermarked content.

Fig. 2 is a flow diagram for watermark detection, showing watermarked content, watermark, extraction technique and decision. Fig. 3 is a communications system diagram for watermarking when the original content is not available at the receiver, showing a transmitter, watermark channel, and receiver.

Fig. 4 is a transmitter diagram, showing a complex fast Fourier transform (FFT) module, transform domain allocation module, inverse fast Fourier transform (IFFT) module, complex random generator, and a normalizer.

Fig. 5 is a receiver diagram, showing a complex FFT module, transform domain allocation module, correlator, complex random generator, normalizer, and synchronizer.

Fig. 6 is a graphic of signal component power with a large difference between components in a 2-dimensional case.

Fig. 7 is a graphic of signal component power with a small difference between components in a 2-dimensional case.

Fig. 8 is a graphic illustrating energy bin decomposition of transformed input data.

Detailed Description

Terms used herein-above and -below include the following: host content - the content to be watermarked; watermark signal - the signal or content used in watermarking as a signature of- the content to be checked for the presence of a watermark; watermarked content — the content that has been watermarked; watermarking - embedding a watermark signal into a host content to enable detection of fraud on copyright; extraction — assessment of the presence/absence of a watermark signal in received content; watermarker - author who intends to embed a watermark signal in a host content; attacker — one who intends to deteriorate or remove a watermark signal from a content known or suspected to be watermarked; DFT— discrete Fourier transform; SNR - signal-to-noise ratio.

Fig. 1 illustrates watermark embedding in an original content.

Fig. 2 illustrates watermark extraction, wherein a suspected watermark signal is regenerated and extracted from a supposed watermarked content. No use of the original content is required.

Fig. 3 illustrates watermarking as a communications system aspect. A watermark signal w is generated using a secret key and the original content s . With the watermark additively embedded there results the signal s bar . Thus, with respect to the detection of w from and observation s bar , embedding can be viewed as an additive noise channel, here called watermark channel. The watermarked signal s bar may be coπiipted by an attack, resulting in the received signal r . The watermark detector or receiver processes r to determine whether or not w is present. The original signal s is not required in the process.

Without precluding the use of other suitable transforms, Figs. 4 and 5 illustrate preferred transmitter and receiver embodiments wherein the transform domain is the Fourier transform.

The watermarking communication system problem

The watermarking problem can be interpreted as a communication system problem as follows: optimize the probability of detection of a watermark signal embedded in a digital content, having a model of this content and of the channel, and given some distortion constraints between the original content and the watermarked content, as well as between the original content and the watermarked content that has been attacked. This ia summarized ^'by the follαwing equations and their corresponding distortion constraints

embedding : β « T(», m) d(β, s) s=Dw D_m„ attack : r β Λ(S) d(β, r) = DA ^ m —Dw detection : irtj

* (1) where s is the original content to -watermark, w the -watermark, s the watermarked contest, r the received content (the contest from which the watermark should be detected), d(ar, y) h a distortion measure between a signal and a signal V, Dw is the mammal tolerable constraint between the original content and the watermarked contest, DA is the maximal tolerable distortion between the original content and the , content after having been submitted to an attack. Also, T fe the generic embedding process, R the generic extraction process end A the generic attack.

Detection theory and additive watermarking

In the technique considered in the preferred embodiment, the watermarking is additive, that is the watermark is added to the concent, in a given transform domain. Using the above notation we may write 5 = β + /(Λ) where the watermark is some function of the original content w — /(*). Is our case this function will be determined by the allocation strategy. Therefore, referring to Pig.3, the two detection hypotheses is the absence of an attack are:

Hi : r = s + ω (2) where, without loss of generality, the above quantities are N-dimcπaonal vectors. As mentioned in the summary, we employ a complex Gaussian model for the statistics of the original content 3. AB a result, the probability density functions under the two hypotheses Ha and Hi are:

where sub-indices 0 and 1 represent hypothesis Ho and Hi, μ is the mean of the received signals, K represents the conditional cσvariancc matrix of r, det(ϋcj is the determinant of K and ||tt| = «^■*#^■"'^■«, where 3 donates the Hermitian operator. Under the maximum-likelihood (ML) detection rule, the likelihood ratio test, expressed in. terms of the natural logarithm, takes the form:

ln Λ * Ur - H*, - l|r -/ttl& % -M et^J^¹)]

Ho

(4)

According to the present working hypothesis, KQ = Kι = K and the norms in the probability density functions of Ξq. (3) become Hrjj-and ||r— u»|j for JETo and respectively. Therefore the likelihood ratio test simplifies to:

and the probability of detection error under Hi is:

β^pt (aβ(τ^!rjr^lw) < |Λ-^lw

where <J(a?) « -^. J^ e~* ^dti is known as the Q-function. The probability of detection error is minimized by maximizing the argument of the mαnotoracally de easing Q- function.

Allocation strategies

An infinity of possible attacks can be used to impair the watermark detection procedure. The common trait of all attacks, however, is the inxroduc ion of additional distortion on top of the watermark tάgnal.

Here we address the problem of optimizing the localization of the watermark energy in the transform domain, components so as to minimize the probability of false detection. To this end, we make a further simplifying assumption regarding the statistics of the host signal, namely perfect dccoπelation of the transform domain signal components. This condition can always be fulfilled by choosing an appropriate transform operator.

We consider a general non-additive attack where the watermarked signal components are arbitrarily attenuated. The detection hypotheses for the general attack are :

Ho : n = hi ^■ 3i

where s<, w,- represent the transform domain components of the signals ϋ and w, and ή« are the attenuation coefficients of the attack. For the sole purpose of determining the ultimate performance limitations for the model under consideration, we will assume that the hi can be estimated perfectly at the receiver. Also, let us denote the total watermark power by

We will further assume for convenience that the trausfor- xn operator provides perfect decorrelat m. of the sranfonned signal components. If this is not the case, one may resort to a second transformation yielding this property prior bo embedding of the watermark. We have, therefore, that the covariance matrix of the transformed received signal is the diagonal matrix K = diag(h|<r .) resulting in tho average probability of error

Due to the monotc i ty of the Q-function the mixάmax problem becomes

(81

The distortion, constraint is given by

Let us define the total watermark-to-signal power ratio con- dititionβd on the attack as W8E(h<) a ∑ _t≠a ^-. _t It follows that the optimal allocation strategy places the entire watermark energy in the single component where the signal power is minimum. The resulting conditional optimal watannark-to-signal power ratio is

The corresponding distortion is

D_w(hi) - (h_h - l)²(*?,j + σϊ) + £ - V σli

where

• arg i σ.,,-. jiK_j≠O

The optimal attack will choose the hi for which the WSR⁰¹* (In) . is τniτπτmιm while satisfying the distortion constraint Doot OH) < D_α. The optimal wa±eπnark-to-signal power ratio is

WSR^opβ = mia WSR*^pt(ft.)

Examples The t o-dimβnβional case

Let us consider the following distribution of the two-dimensional signal a in Fig. 6.

From the watermarkers perspective, he would try to put all the watermark in σ_βi since it maximizes the WSR. Because of the large difference in signal powers in the two components, the attacker can nonetheless put hi = 0 and still satisfy the distortion constraint: Therefore,- the watermarker has no choice but to place the watermark in the component where the signal is strongest, if he is to operate with a low probability of false detection.

Now consider in Fig. 7 where the signal component powers axe close to one another. In this case, only one distortion constraint becomes possible, the one in which both h are non-zero. The watermarker is now free to put his watermark in either one (both can be retrieved) and will choose to put it in the signal component that yields the highest WSR, namely Aii higher-dimensional scheme

In this "optimal" framework (based on the mean-squared distortion measure) we see that only one component in the transform domain holds the watermark. In practice, however, it is necessary to spread the watermark hv greater dimensions in order to

1. allow for a watermarking sequence of a significant length (e.g.. lQO-lOOQ dimensions for block sizes of 2¹⁴)

2. allow for techniques) not based on a mean-squared distortion measure which reduce the perceptibility of the watermark

3. provide added robustness against attacks which are ■ not modeled by this general framework.

This can he seen as a practical tradeoff between techniques based on different distortion measures (i.e. perceptual). These examples can be generalized to N dhnenεionB in the following manner. The N dimensions axe split into < N energy bins, the borders of which are determined by any arbitrary perception-based scheme. The total signal power for a particular bin is the sum of the variances of the components falling in that bin. The allocation scheme then uses these bin owers to place the watermark in the components occupying the optimal bin. The bin borders and their number need not be placed in the public domain. This decomposition into energy bins is shown in. Fig. β.

The watermarking technique applied to audio content

We now detail the complete watermarking technique in the case of audio content. Note that- the same technique can be applied to images and video sequences.

The embedding technique. The block diagram of the embedding process (transmitter) is shown in Fig. The computing algorithm is the following:

1. generate a complex pseudo-random sequence of 2¹4 sample. The real part of the random sequences are obtained with a secret -key A" used as seed; the first -value of the real random sequence is used as the seed to obtain the imaginary part of the random sequence.

2. computation of the complex Discrete Fourier Transform (DFT) of the host signal, on successive blocks of 2¹⁴ samples of content, over the entire content's length

3. statistically determine the variance σf each components of the DPT over the entire length of the content

4. select the components whose -variance lie between a minimum -value (0.3) and a maτiττπιπι value (0.7S) of the normalized variances (normalised to 1.0)

5. compute the DFT of the watermark signal over its entire length

6. select those components of the watermark DFT that correspond to thα indexes of the selected host signal components as described in 4

7. add, for each block of the host content, the same watermark signal to the (block) host content, in those selected dimensions to yield a watermarked transform- domain content 8. compute the inverse DFT of the watermarked content as to obtain, a watermark content- in the original do- main of the host conten

Comments. This invention is not restricted to the block size - used in the preferred embodiment; nor to the transform domain nor to any ar icular «»1««_* <_* τnfrπιτmnτι and πnariτpmra[ value la. step 4 of the embedding algorithm- The pacudo- rando real and imaginary parts can. be obtained using other techniques that lead to a complex pseudo-random sequence.

The extraction technique

The block diagram of the receiver of system B is illuattated in Fig. 5. It is very similar: to the transmitter, and comprises:

1. regeneration of a length 2^U samples watermark signal using the same, secret key K aa that used at the embedding yielding a pseudo-random sequence

2. compute the complex DFT of the watermark signal

3. compute the complex DFT of the received content on successive blacks of 2¹⁴ samples, over the entire received content

4. statistically determine the variance of each components of the DFT over the entire length of the content

5- select the components whose variance lie between a πτniτπτιτn value (0.3) and a ™»ήmττ' value (0.75) of the normalized variances (normalized to 1.0)

6. select those components of the watermark DFT that correspond to the indexes of the selected host signal components as described in 4

7. for each block of the received content, compute a correlation factor between the received content and the watermark signal on. the 2^U samples

8. use a r©-syncfaroni«tion- algorithm, described below, far each block, if the correlation factor of that block is low

9. take a partial decision based on the correlation factor's value on each processing block

10. based on the ensemble of partial decisions over the entire received content'3 length, make a decision on the presence or not of the watermark signal in the received content.

A correlation operation is used at the receiver to assess the presence of the watermark. A threshold will be necessary to make the decision. All the other elements of the receiver are the same as those in the transmitter.

Comments. As for the embedding technique, this invention is not restricted to the block size used in the preferred embodiment, nor to the transform domain nor to any particular value of minimum and maximum value in step 4 of the algorithm.

The ^s nchroniza ion, algorithm.

In order to cope with temporal attacks on the signal (for example cropping or sample stealing) a re-synchronization algorithm is used at the receiver. The algorithm is only used if the correlation factor either drops from a high value to a low one, or if the correlation in the first block of content has low value. The algorithm searches for the shift in the transform domain (hi the present case of the FFT, the shift is the phase) that leads to best correlation factor, and uses this shift from, that block, on. If at the end of the search,, no "optimal" shift was.fαund, the block of content is taken as not watermarked, and the extraction of the waieπnark rβ-βtarts at the next block of data. The algorith runs as follows any block (starring at Index fe) for which the re-synchronization is necessary:

* SOB — k,CFo ~ 0

* for i ss 0 : BL

-~ da =* SOB +•», db - 50-5 BL Η

- CF = C (s(dα : db),r(SOB : SOB +-BL))

— i£CF > 7-5b then SOB =* da *- k =* SOB --BL where SOB is the start of the block (SOB = k here), CF is the correlat-on factor, CO(x, y) is the correlation between s and y and BL is the block length used in the embedding and extraction processes.

Final decision on he presence of the watermark signal. The actual decision on the presence/absence of the- watermark signal in the digital content is based on the partial decisions (relative to a threshold) taken in each block of the content. In the preferred embodiment, the decision Is based on the arithmetic average over all partial decision- s. If the average is above a threshold, the content is said to be watermarked; if the average is close to the threshold, the content is said to be watermarked with a probability given by the de sfon number, and if the average is much below the threshold, the content is said to-be not watermarked. The normalized (to the maximal correlation factor obtained) threshold is equal to 0.9 in the preferred embodiment. Therefore, if DW is the decision on the presence/absence of the watermark signal, DW_n the partial decision on any block of content, and N the total amount of blocks over the entire content's length, and 6 the threshold value, we have:

n ff r is watermarked if ^ ^DWn. > θ n=tf r is not watermarked if ^ DW_n < 9 n=0

Claims

1. A method for allocating a transform domain in watermarking, comprising the steps of

(a) obtaining components of a transform-domain representation of data; (b) determining a Gaussian variance estimate for each of the components;

(c) using minimal and maximal threshold criteria on the variance estimates to determine a plurality of candidate dimensions from the transform-domain representation; and

(d) selecting a plurality of signature dimensions from among the candidate dimensions.

2. The method of claim 1, wherein the transform-domain representation is as obtained from one of a real Fourier transformation, discrete cosine transform, Hadamard transformation, wavelet transformation, multi-resolution sub-band method and Karhunen-Loeve fractal transform.

3. The method of claim 1 , wherein the transform-domain representation is as obtained from one of a complex Fourier transformation, discrete cosine transform, Hadamard transformation, wavelet transformation, multi-resolution sub-band method and Karhunen-Loeve fractal transform.

4. A method for embedding a watermark into data, comprising the steps of (a) obtaining components of a transform-domain representation of data;

(b) determining a Gaussian variance estimate for each of the components;

(c) using minimal and maximal threshold criteria on the variance estimates to determine a plurality of candidate dimensions from the transform-domain representation; (d) selecting a plurality of signature dimension from among the candidate dimensions;

(e) using a secret key to generate a watermark in the transform domain; (f) determining a plurality of dimensions into which to embed the watermark;

(g) embedding the watermark in the transform domain in the determined plurality of dimensions; and

(h) inverse-transforming to obtain the watermarked data.

5. The method of claim 4, wherein the minimal and maximal threshold criteria are specified by a secret key.

6. The method of claim 4, wherein step (g) comprises determining watermark power from a spectral masking model.

7. The method of claim 4, wherein the data comprises audio data.

8. The method of claim 4, wherein the data comprises image data.

9. The method of claim 4, wherein the data comprises video data.

10. The method of claim 4, wherein the data comprises multimedia data.

11. The method of claim 4, wherein embedding comprises addition.

12. The method of claim 4, wherein the transform-domain representation is as obtained from one of a real/complex Fourier transformation, discrete cosine transform, Hadamard transformation, wavelet transformation, multi-resolution sub- band method and Karhunen-Loeve fractal transform.

13. A method for detecting/extracting a watermark from watermarked data, comprising the steps of: (a) using a secret key in regenerating a watermark in a transform domain;

(b) generating a transform-domain representation of the watermarked data;

(c) determining a plurality of transform-domain dimensions which are suitable for watermark embedding;

(d) comparing the determined plurality of dimensions of the transform-domain representation of the watermarked data with the corresponding plurality of the watermark; (e) generating an estimate of likelihood of watermark presence based on the comparison.

14. The method of claim 13, further to the comparison comprising an additional discrimination step.

15. The method of claim 14, wherein the additional discrimination step comprises re-synchronization.

16. The method of claim 13, further comprising extracting the watermark from the watermarked data.

17. The method of claim 13, wherein the watermarked data comprises one of audio data, image data and video data.

18. The method of claim 13, wherein the watermarked data comprises multimedia data.

19. The method of claim 13, wherein step (e) comprises thresholding a correlation factor average.

20. The method of claim 13, wherein step (e) comprises thresholding a correlation factor median.

21. The method of claim 13, wherein step (e) comprises thresholding a correlation factor weighted average.

22. The method of claim 13, wherein step (e) comprises thresholding a maximum correlation factor.

23. The method of claim 13, wherein step (e) comprises thresholding a minimum correlation factor.

24. The method of claim 13, wherein step (d) comprises correlating the plurality of transform-domain dimensions and the corresponding plurality of the watermark.

25. The method of claim 24, further comprising thresholding a result of correlating.

26. The method of claim 13, wherein the transform is one of a real/complex

Fourier transformation, discrete cosine transform, Hadamard transformation, wavelet transformation, multi-resolution sub-band method and Karhunen-Loeve fractal transform.

27. The method of claim 13, wherein the watermarked data is corrupted by at least one of additive noise, colored noise, cropping, sample stealing, compression, resampling, over-sampling, under-sampling, rotation, scaling and filtering.

28. The method of claim 27, wherein compression is by one of mp3, MPEG- 2-AAC and Dolby.

29. The method of claim 13, wherein the watermarked data comprises samples representing one/several of magnitude, phase, luminance, chrominance, a component of color representation, and video signaling standard.

30. The method of claim 29, wherein color representation comprises RGB, Lab, Luv, XYZ, CMY. and CMYK.

31. The method of claim 29, wherein video signaling standard comprises NTSC, PAL and SECAM.

32. The method of claim 13, wherein the watermarked data consist of all/part of a decomposition/representation of multimedia data.

33. A method for re-synchronizing between two sets of data in digital watermark detection, comprising exhaustively comparing, for maximized correlation, between differently shifted transform-domain representations of each of the sets.