Audio Watermarking Using Independent Component Analysis
 Author: Seok Jongwon
 Organization: Seok Jongwon
 Publish: Journal of information and communication convergence engineering Volume 10, Issue2, p175~180, 30 June 2012

ABSTRACT
This paper presents a blind watermark detection scheme for an additive watermark embedding model. The proposed estimationcorrelationbased watermark detector first estimates the embedded watermark by exploiting nonGaussian of the realworld audio signal and the mutual independence between the hostsignal and the embedded watermark and then a correlationbased detector is used to determine the presence or the absence of the watermark. For watermark estimation, blind source separation (BSS) based on independent component analysis (ICA) is used. Low watermarktosignal ratio (WSR) is one of the limitations of blind detection with the additive embedding model. The proposed detector uses twostage processing to improve the WSR at the blind detector; the first stage removes the audio spectrum from the watermarked audio signal using linear predictive (LP) filtering and the second stage uses the resulting residue from the LP filtering stage to estimate the embedded watermark using BSS based on ICA. Simulation results show that the proposed detector performs significantly better than existing estimationcorrelationbased detection schemes.

KEYWORD
Audio , Independent component analysis , Linear predictive coding , Watermark

I. INTRODUCTION
Spread spectrum (SS)based watermarking is one of the most representative method of the blind additive embedding (AE) model, which relies on the theory of spread spectrum communication for information embedding and detection. More specifically, in case of SSbased watermarking, the host signal acts as interference at the blind detector (the host signal, x, is not used during the watermark detection process) and because the host signal has much higher energy than the watermark, host interference causes the detection performance to deteriorate at the blind detector (or detectors, hereon unless otherwise stated). Superior detection performance is one of the desirable features of the blind AE model.
The main motivation of this paper is to design a blind detector for SSbased watermarking. Existing detectors for SSbased watermarking schemes are bounded by the host signal interference at the detector. The proposed detector intends to reduce the host signal interference by developing an estimationcorrelationbased detection framework. The proposed detector, therefore, consists of two stages: 1) watermark estimation stage, and 2) watermark detection stage. The objective of the watermark estimation stage is to estimate the embedded watermark, which has a higher watermarktosignalratio (WSR) than the watermarked audio. To accomplish this goal, blind source separation based on the underdetermined independent component analysis (UICA) framework (i.e., ICA for more sources than sensors) is used for watermark estimation. To this end, we model the problem of blind watermark detection for AE as that of blind source separation (BSS) for underdetermined mixtures. To ensure better WSR at the watermark estimation stage, the watermarked audio is preprocessed to remove correlation in audio signal using linear predictive (LP) filtering. It has been shown that the received watermarked signal is an underdetermined linear mixture of the underlying independent sources obeying nonGaussian distributions, therefore BSS based on the UICA framework can be used for watermark estimation [1].
A similarity measure based on correlation is then used to detect the presence or the absence of the embedded watermark in the estimated watermark. Performance of the proposed watermark detection scheme is evaluated using a sound quality assessment material (SQAM) downloaded from [2]. Simulation results for the SQAM dataset show that the proposed scheme performs significantly better than existing estimationcorrelationbased detection schemes [2] based on median filtering and Wiener filtering.
II. MOTIVATION
The majority of existing SSbased watermarking schemes [3] use an AE model to insert the watermark into the host audio. Mathematically, the SSbased watermark embedding process can be expressed as
where
y (n ) is the watermarked audio signal in the marking space,x (n ) the original signal in the marking space, andw (n ) the watermark signal. It is reasonable to assume thatx (n ) andw (n ) are zeromean and independent and identically distributed (i.i.d.) random variables with variance,respectively. It is assumed further that
andx are mutually independent. In the data hiding literature, the embedding model given by Eq. (1) is referred to as blind AE, as the embedder ignores the host signal information during the watermark embedding process.w Adversary attacks or distortion due to signal manipulations,
, can be modeled as an additive channel distortion. Therefore, the watermarked audio signal subjected to adversary attacks or channel distortions,v is processed at the detector to detect the presence or the absence of the embedded watermark. The basic additive embedding and correlation based detection framework is shown in Fig. 1.
A correlationbased detector is commonly used to detect the presence or the absence of the embedded watermark. The decision threshold,
d , is obtained by correlating the received watermarked audio and the watermark sequence at the detector is given aswhere
is the energy of the watermark, E
_{x} {d _{x} } = 0, and E_{v} {d _{v} } = 0, where E_{x} {.} denotes the expectation over the random variable_{x} .It is important to mention that detection performance of the correlationbased detector depends on the decision threshold used. Let us assume that watermark detection threshold
T =d_{x} /2. In this case, the probability of a false negativeP_{fn} (the watermark is present, but the detector decides otherwise) equals the probability of a false positiveP_{fp} (the watermark is not present but the detector decides otherwise), that is,where
erfc (？) is the complementary error function.It can be summarized that the detection performance of a blind detector for additive watermarking schemes is inherently bounded by the hostsignal interference at the detector. The motivation behind this paper is to design a watermark detector for AE with improved watermark detection performance. Towards this end, the proposed detector uses the theory of ICA by posing watermark estimation as a BSS problem from an underdetermined mixture of independent sources. The fundamentals of the ICA theory are briefly outlined in the following section followed by the details of the proposed ICAbased detector.
> A. Independent Component Analysis
ICA is a statistical framework for estimating underlying hidden factors or components of multivariate statistical data. In the ICA model, the data variables are assumed to be linear or nonlinear mixtures of some unknown latent variables, and the mixing system is also unknown [4]. Moreover, these hidden variables are assumed to be nonGaussian and mutually independent. The linearstatistical, static ICA generative model considered in this paper, which is given as
where
X ^{(i)} representsN realizations of anm dimensional random vector known as the observation vector,S ^{(i)} are the realizations of then _{1}dimensional source vector or the hidden variables,A ∈R ^{m×n1} is an unknown matrix also referred to as the mixing matrix, andV ^{(i)} represents realizations of the noise, independent of the underlying sources,S ^{(i)} .The mixtures in which the number of observations(dimensionality of observation vector,
X ^{(i)} ), m, is smaller than the number of sources (dimensionality of source vectorS ^{(i)} ),n_{1} , i.e.,m <n_{1} is referred to as underdetermined mixtures. In this paper, we will consider such mixtures to model the blind watermark estimation problem using BSS based on the ICA framework. The BSS framework using the ICA model tries to find a linear representation of the underlying sources,S ^{(i) }i =1 ,…,n_{1} , which are statistically independent based on the observation only. In other words, the BSS framework is intended to estimate the mixing matrix or the hidden sourcesS ^{ (i)} or both from the observation alone. As pointed out in many works such as in [1, 4], the underdetermined mixtures cannot be linearly inverted due to being bound on the rank of the mixing matrix, which makes it even more difficult to estimate the underlying sources even if the mixing matrix is known. Therefore, the problem of extracting sources from the observation following the underdetermined mixture model is nontrivial [4]. Recently, many researchers have proposed methods to solve BSS for the underdetermined mixtures problem [1, 4, 5]. For example, De Lathauwer et al. [5] have proposed a BSS scheme based on multilinear analysis, and HojenSorensen et al. [1] have proposed a statistical framework based on meanfield approaches to solve BSS for underdetermined mixtures.Before BSS based on underdetermined ICA can be used to estimate the watermark from the watermarked audio, we need to verify the following: 1) the watermarked audio is an underdetermined mixture of independent sources, and 2) the underlying sources obey a nonGaussian distribution. It can be observed from Eq. (1) that the AE model fits into an underdetermined linear mixture model; therefore, BSS for underdetermined mixtures can be used to estimate the embedded watermark given that the underlying latent sources (
andx ) satisfy nonGaussianity and independence constraints. This is a realistic assumption, as multimedia data can be modeled using the nonGaussian distribution [6]. Therefore, if an embedded watermark obeys a nonGaussian distribution then BSS based on UICA can be used for watermark estimation from the watermarked signal [6].w III. WATERMARK DETECTION
This section provides an overview of the proposed blind watermark detection scheme from the received watermarked audio signal obtained by additive embedding. The proposed watermark detection scheme consists of two stages: 1) the watermark estimation stage, and 2) the watermark detection stage. It was mentioned earlier that the watermark estimation stage is further divided into two substages: the spectral removal stage and the source separation stage.
> A. Watermark Estimation
The goal of the watermark embedder is that the embedded watermark should survive intentional and unintentional attacks, whereas the goal of the watermark detector is to detect the embedded watermark with very low false rates in the presence of an active adversary and signal manipulations. In case of the AE model, low false rates are difficult to achieve due to strong host interference. For detector performance analysis, existing correlationbased schemes model the audio signal as a white Gaussian channel. Recent results in audio processing and compression community, however, show that samples of the real audio signals are highly correlated, which can be exploited to improve the detection performance by decorrelating the input audio before detection. The proposed detection scheme achieves this goal by applying whitening or decorrelation before watermark estimation. Simulation results presented in this paper show that the whitening before watermark estimation using ICA improves detection performance significantly. This improvement can be attributed to the fact that whitening actually increases the watermark to interference ratio and hence yields superior detection performance.
To remove the correlation in the audio signal, an autoregressive modeling named linear predictive coding (LPC) [7] can be used. The LPC method approximates the original audio signal,
x (n ), as a linear combination of the pastp audio samples, such thatwhere the coefficients
a_{1} ,a_{2} ,…a_{p} are assumed to be constant over the selected audio segment. Rewriting Eq. (6) by including an error terme (n ), that is,where
e (n ) is an excitation or residual signal ofx (n ). Now using Eq. (7), Eq. (1) can be expressed asLikewise, watermark audio can also be expressed as,
where
is the residual signal of the watermarked audio signal. We assume that, by the characteristics of linear predictive analysis,
has the characteristics of both
e (n ) andw (n ). We can consider the linear combination of the past audio sample as the estimatedefined as
Here prediction error,
e (n ), can be expressed asIt can be observed from Eq. (11) that the estimate
with the audio spectrum removed has the characteristics of both the excitation signal of the original audio
x (n ) and the watermark signalw (n ).This method transforms the nonwhite watermarked audio signal to a whitened signal by removing the audio spectrum. It can be observed from Fig. 2 that is the empirical probability density function (pdf) of a small segment of the watermarked audio signal before LP filtering and the residual or error signal of the watermarked audio signal after LP filtering. The empirical pdf of the watermarked audio signal is clearly not smooth and has large variations due to the voiced part. On the other hand, the empirical pdf of the residual signal has a smoother distribution and a smaller variance than the watermarked audio signal.
It is important to mention that the LPC stage also improves WSR which ultimately improves the source separation performance of the BSS used for watermark estimation. This is because, the watermark sequence is i.i.d., so the decorrelation stage does not reduce its energy in the residual signal, whereas, decorrelation does reduce audio signal energy.
The residual signal is the then used to estimate the hidden watermark using BSS based on UICA. For watermark estimation, the probabilistic ICA method based on meanfield approaches is used. Superior source separation performance is the only motivation behind using of the probabilistic ICA presented in [1]. This is, however, not the limitation of the proposed scheme, as any of the BSS schemes based on UICA can be used for watermark estimation from the residual signal. Estimated sources are then correlated with the watermark,
, to determine the presence or the absence of the embedded watermark. Fig. 3 shows the block diagram of the proposed audio watermark detection scheme.w IV. EXPERIMENTAL RESULTS
The binary message to be embedded is first modulated by a keydependent random sequence. The watermark is then spectrally shaped in the frequency domain according to a masking threshold estimated based on the human auditory system (HAS) ISO/MPEG1 Audio Layer III model [8]. The motivation here is to design the weighting function that maximizes the energy of the embedded watermark subject to a required acceptable distortion. The resulting watermark is then added into the original audio signal in the frequency domain, which is then transformed to the time domain to obtain the watermarked audio. A semantic diagram of the audio watermark embedding scheme discussed above is shown in Fig. 4.
The simulation results presented in this section are based on the following system settings: 1) 44 kHz sampled and 16bit resolution audio signals are used as the host audio, 2) a 1,024point watermark is then embedded into four consecutive nonoverlapping frames, 3) the watermarked signal is first segmented into nonoverlapping frames of 4,096 samples each, then each frame is further segmented into four nonoverlapping subframes which are then applied to the ICA block to estimate the embedded watermark after LPC filtering. For performance evaluation, SQAM downloaded from [2] was used.
The robustness performance of the proposed the proposed watermark estimation scheme was evaluated for the following attack scenarios: 1) no adversary attack, 2) additive white Gaussian noise, 3) MP3 compression (128 kbps), and 4) bandpass filtering (2ndorder Butterworth filter with cutoff frequencies 100 and 6,000 Hz).
Detection performance of the proposed estimationcorrelation based detector scheme and the existing schemes for these attacks is given in Fig 5. It is observed from Fig. 5 that, for four attack scenarios, the proposed detector outperforms the exiting detectors. In addition, detection performance of the watermark detectors under consideration for the SQAM database is given in Table 1. It can be observed from both Fig. 5 and Table 1 that the proposed detector performs significantly better than its counterparts. Improved detection performance of the proposed detector can be attributed to its better host signal interference cancelation capability.
V. CONCLUSIONS
In this paper, we described a new framework for estimationcorrelation based detection for additive embedding. The proposed blind detection method extracts the embedded watermark signal suppressing the host signal interference at the detector. The proposed framework exploits mutual independence and nonGaussianity of the audio signal and the embedded watermark to estimate the embedded watermark using BSSbased UICA. Experimental results showed that the proposed detection scheme is robust.

2.

[Fig. 1.] Semantic diagram of basic additive embedding and correlationbased detection framework.

[Fig. 2.] Empirical probability density functions before and after linear prediction filtering.

[Fig. 3.] Block diagram of the proposed watermark detection procedure. LPC: linear predictive coding, ICA: independent component analysis.

[Fig. 4.] Watermark embedding procedure. FFT: fast Fourier transform, IFFT: inverse FFT.

[Fig. 5.] Robustness performance: no Attack (topleft), additive white Gaussian noise attack (5% noise power, topright), MP3 compression attack (128 kbps, bottomleft), and bandpass filtering attack (bottomright). ICA: independent component analysis, LPC: linear predictive coding.

[Table 1.] Correlation values depending on the detection method using the sound quality assessment material [2] database for magnetic propertie