Audio Watermarking Using Independent Component Analysis

  • cc icon
  • ABSTRACT

    This paper presents a blind watermark detection scheme for an additive watermark embedding model. The proposed estimation-correlation-based watermark detector first estimates the embedded watermark by exploiting non-Gaussian of the real-world audio signal and the mutual independence between the host-signal and the embedded watermark and then a correlation-based detector is used to determine the presence or the absence of the watermark. For watermark estimation, blind source separation (BSS) based on independent component analysis (ICA) is used. Low watermark-to-signal ratio (WSR) is one of the limitations of blind detection with the additive embedding model. The proposed detector uses two-stage processing to improve the WSR at the blind detector; the first stage removes the audio spectrum from the watermarked audio signal using linear predictive (LP) filtering and the second stage uses the resulting residue from the LP filtering stage to estimate the embedded watermark using BSS based on ICA. Simulation results show that the proposed detector performs significantly better than existing estimation-correlation-based detection schemes.


  • KEYWORD

    Audio , Independent component analysis , Linear predictive coding , Watermark

  • I. INTRODUCTION

    Spread spectrum (SS)-based watermarking is one of the most representative method of the blind additive embedding (AE) model, which relies on the theory of spread spectrum communication for information embedding and detection. More specifically, in case of SS-based watermarking, the host signal acts as interference at the blind detector (the host signal, x, is not used during the watermark detection process) and because the host signal has much higher energy than the watermark, host interference causes the detection performance to deteriorate at the blind detector (or detectors, hereon unless otherwise stated). Superior detection performance is one of the desirable features of the blind AE model.

    The main motivation of this paper is to design a blind detector for SS-based watermarking. Existing detectors for SS-based watermarking schemes are bounded by the host signal interference at the detector. The proposed detector intends to reduce the host signal interference by developing an estimation-correlation-based detection framework. The proposed detector, therefore, consists of two stages: 1) watermark estimation stage, and 2) watermark detection stage. The objective of the watermark estimation stage is to estimate the embedded watermark, which has a higher watermark-to-signal-ratio (WSR) than the watermarked audio. To accomplish this goal, blind source separation based on the underdetermined independent component analysis (UICA) framework (i.e., ICA for more sources than sensors) is used for watermark estimation. To this end, we model the problem of blind watermark detection for AE as that of blind source separation (BSS) for underdetermined mixtures. To ensure better WSR at the watermark estimation stage, the watermarked audio is pre-processed to remove correlation in audio signal using linear predictive (LP) filtering. It has been shown that the received watermarked signal is an underdetermined linear mixture of the underlying independent sources obeying non-Gaussian distributions, therefore BSS based on the UICA framework can be used for watermark estimation [1].

    A similarity measure based on correlation is then used to detect the presence or the absence of the embedded watermark in the estimated watermark. Performance of the proposed watermark detection scheme is evaluated using a sound quality assessment material (SQAM) downloaded from [2]. Simulation results for the SQAM dataset show that the proposed scheme performs significantly better than existing estimation-correlation-based detection schemes [2] based on median filtering and Wiener filtering.

    II. MOTIVATION

    The majority of existing SS-based watermarking schemes [3] use an AE model to insert the watermark into the host audio. Mathematically, the SS-based watermark embedding process can be expressed as

    image

    where y(n) is the watermarked audio signal in the marking space, x(n) the original signal in the marking space, and w(n) the watermark signal. It is reasonable to assume that x(n) and w(n) are zero-mean and independent and identically distributed (i.i.d.) random variables with variance,

    image

    respectively. It is assumed further that x and w are mutually independent. In the data hiding literature, the embedding model given by Eq. (1) is referred to as blind AE, as the embedder ignores the host signal information during the watermark embedding process.

    Adversary attacks or distortion due to signal manipulations, v, can be modeled as an additive channel distortion. Therefore, the watermarked audio signal subjected to adversary attacks or channel distortions,

    image

    is processed at the detector to detect the presence or the absence of the embedded watermark. The basic additive embedding and correlation based detection framework is shown in Fig. 1.

    A correlation-based detector is commonly used to detect the presence or the absence of the embedded watermark. The decision threshold, d, is obtained by correlating the received watermarked audio and the watermark sequence at the detector is given as

    image

    where

    image

    is the energy of the watermark, Ex{dx} = 0, and Ev{dv} = 0, where Ex{.} denotes the expectation over the random variable x.

    It is important to mention that detection performance of the correlation-based detector depends on the decision threshold used. Let us assume that watermark detection threshold T = dx /2. In this case, the probability of a false negative Pfn (the watermark is present, but the detector decides otherwise) equals the probability of a false positive Pfp (the watermark is not present but the detector decides otherwise), that is,

    image

    where erfc(?) is the complementary error function.

    It can be summarized that the detection performance of a blind detector for additive watermarking schemes is inherently bounded by the host-signal interference at the detector. The motivation behind this paper is to design a watermark detector for AE with improved watermark detection performance. Towards this end, the proposed detector uses the theory of ICA by posing watermark estimation as a BSS problem from an underdetermined mixture of independent sources. The fundamentals of the ICA theory are briefly outlined in the following section followed by the details of the proposed ICA-based detector.

      >  A. Independent Component Analysis

    ICA is a statistical framework for estimating underlying hidden factors or components of multivariate statistical data. In the ICA model, the data variables are assumed to be linear or nonlinear mixtures of some unknown latent variables, and the mixing system is also unknown [4]. Moreover, these hidden variables are assumed to be non-Gaussian and mutually independent. The linear-statistical, static ICA generative model considered in this paper, which is given as

    image

    where X (i) represents N-realizations of an m-dimensional random vector known as the observation vector, S (i) are the realizations of the n1-dimensional source vector or the hidden variables, ARm×n1 is an unknown matrix also referred to as the mixing matrix, and V (i) represents realizations of the noise, independent of the underlying sources, S(i) .

    The mixtures in which the number of observations(dimensionality of observation vector, X (i) ), m, is smaller than the number of sources (dimensionality of source vector S (i) ), n1, i.e., m < n1 is referred to as underdetermined mixtures. In this paper, we will consider such mixtures to model the blind watermark estimation problem using BSS based on the ICA framework. The BSS framework using the ICA model tries to find a linear representation of the underlying sources, S (i) i = 1,…,n1, which are statistically independent based on the observation only. In other words, the BSS framework is intended to estimate the mixing matrix or the hidden sources S (i) or both from the observation alone. As pointed out in many works such as in [1, 4], the underdetermined mixtures cannot be linearly inverted due to being bound on the rank of the mixing matrix, which makes it even more difficult to estimate the underlying sources even if the mixing matrix is known. Therefore, the problem of extracting sources from the observation following the underdetermined mixture model is nontrivial [4]. Recently, many researchers have proposed methods to solve BSS for the underdetermined mixtures problem [1, 4, 5]. For example, De Lathauwer et al. [5] have proposed a BSS scheme based on multilinear analysis, and Hojen-Sorensen et al. [1] have proposed a statistical framework based on mean-field approaches to solve BSS for underdetermined mixtures.

    Before BSS based on underdetermined ICA can be used to estimate the watermark from the watermarked audio, we need to verify the following: 1) the watermarked audio is an underdetermined mixture of independent sources, and 2) the underlying sources obey a non-Gaussian distribution. It can be observed from Eq. (1) that the AE model fits into an underdetermined linear mixture model; therefore, BSS for underdetermined mixtures can be used to estimate the embedded watermark given that the underlying latent sources (x and w) satisfy non-Gaussianity and independence constraints. This is a realistic assumption, as multimedia data can be modeled using the non-Gaussian distribution [6]. Therefore, if an embedded watermark obeys a non-Gaussian distribution then BSS based on UICA can be used for watermark estimation from the watermarked signal [6].

    III. WATERMARK DETECTION

    This section provides an overview of the proposed blind watermark detection scheme from the received watermarked audio signal obtained by additive embedding. The proposed watermark detection scheme consists of two stages: 1) the watermark estimation stage, and 2) the watermark detection stage. It was mentioned earlier that the watermark estimation stage is further divided into two sub-stages: the spectral removal stage and the source separation stage.

      >  A. Watermark Estimation

    The goal of the watermark embedder is that the embedded watermark should survive intentional and unintentional attacks, whereas the goal of the watermark detector is to detect the embedded watermark with very low false rates in the presence of an active adversary and signal manipulations. In case of the AE model, low false rates are difficult to achieve due to strong host interference. For detector performance analysis, existing correlation-based schemes model the audio signal as a white Gaussian channel. Recent results in audio processing and compression community, however, show that samples of the real audio signals are highly correlated, which can be exploited to improve the detection performance by de-correlating the input audio before detection. The proposed detection scheme achieves this goal by applying whitening or de-correlation before watermark estimation. Simulation results presented in this paper show that the whitening before watermark estimation using ICA improves detection performance significantly. This improvement can be attributed to the fact that whitening actually increases the watermark to interference ratio and hence yields superior detection performance.

    To remove the correlation in the audio signal, an autoregressive modeling named linear predictive coding (LPC) [7] can be used. The LPC method approximates the original audio signal, x(n), as a linear combination of the past p audio samples, such that

    image

    where the coefficients a1 , a2 ,…ap are assumed to be constant over the selected audio segment. Rewriting Eq. (6) by including an error term e(n), that is,

    image

    where e(n) is an excitation or residual signal of x(n). Now using Eq. (7), Eq. (1) can be expressed as

    image

    Likewise, watermark audio can also be expressed as,

    image

    where

    image

    is the residual signal of the watermarked audio signal. We assume that, by the characteristics of linear predictive analysis,

    image

    has the characteristics of both e(n) and w(n). We can consider the linear combination of the past audio sample as the estimate

    image

    defined as

    image

    Here prediction error, e(n), can be expressed as

    image

    It can be observed from Eq. (11) that the estimate

    image

    with the audio spectrum removed has the characteristics of both the excitation signal of the original audio x(n) and the watermark signal w(n).

    This method transforms the non-white watermarked audio signal to a whitened signal by removing the audio spectrum. It can be observed from Fig. 2 that is the empirical probability density function (pdf) of a small segment of the watermarked audio signal before LP filtering and the residual or error signal of the watermarked audio signal after LP filtering. The empirical pdf of the watermarked audio signal is clearly not smooth and has large variations due to the voiced part. On the other hand, the empirical pdf of the residual signal has a smoother distribution and a smaller variance than the watermarked audio signal.

    It is important to mention that the LPC stage also improves WSR which ultimately improves the source separation performance of the BSS used for watermark estimation. This is because, the watermark sequence is i.i.d., so the de-correlation stage does not reduce its energy in the residual signal, whereas, de-correlation does reduce audio signal energy.

    The residual signal is the then used to estimate the hidden watermark using BSS based on UICA. For watermark estimation, the probabilistic ICA method based on mean-field approaches is used. Superior source separation performance is the only motivation behind using of the probabilistic ICA presented in [1]. This is, however, not the limitation of the proposed scheme, as any of the BSS schemes based on UICA can be used for watermark estimation from the residual signal. Estimated sources are then correlated with the watermark, w, to determine the presence or the absence of the embedded watermark. Fig. 3 shows the block diagram of the proposed audio watermark detection scheme.

    IV. EXPERIMENTAL RESULTS

    The binary message to be embedded is first modulated by a key-dependent random sequence. The watermark is then spectrally shaped in the frequency domain according to a masking threshold estimated based on the human auditory system (HAS) ISO/MPEG-1 Audio Layer III model [8]. The motivation here is to design the weighting function that maximizes the energy of the embedded watermark subject to a required acceptable distortion. The resulting watermark is then added into the original audio signal in the frequency domain, which is then transformed to the time domain to obtain the watermarked audio. A semantic diagram of the audio watermark embedding scheme discussed above is shown in Fig. 4.

    The simulation results presented in this section are based on the following system settings: 1) 44 kHz sampled and 16-bit resolution audio signals are used as the host audio, 2) a 1,024-point watermark is then embedded into four consecutive non-overlapping frames, 3) the watermarked signal is first segmented into non-overlapping frames of 4,096 samples each, then each frame is further segmented into four non-overlapping sub-frames which are then applied to the ICA block to estimate the embedded watermark after LPC filtering. For performance evaluation, SQAM downloaded from [2] was used.

    The robustness performance of the proposed the proposed watermark estimation scheme was evaluated for the following attack scenarios: 1) no adversary attack, 2) additive white Gaussian noise, 3) MP3 compression (128 kbps), and 4) bandpass filtering (2nd-order Butterworth filter with cutoff frequencies 100 and 6,000 Hz).

    Detection performance of the proposed estimation-correlation based detector scheme and the existing schemes for these attacks is given in Fig 5. It is observed from Fig. 5 that, for four attack scenarios, the proposed detector outperforms the exiting detectors. In addition, detection performance of the watermark detectors under consideration for the SQAM database is given in Table 1. It can be observed from both Fig. 5 and Table 1 that the proposed detector performs significantly better than its counterparts. Improved detection performance of the proposed detector can be attributed to its better host signal interference cancelation capability.

    V. CONCLUSIONS

    In this paper, we described a new framework for estimation-correlation based detection for additive embedding. The proposed blind detection method extracts the embedded watermark signal suppressing the host signal interference at the detector. The proposed framework exploits mutual independence and non-Gaussianity of the audio signal and the embedded watermark to estimate the embedded watermark using BSS-based UICA. Experimental results showed that the proposed detection scheme is robust.

  • 1. Hojen-Sorensen P., Winther O., Hansen L. K. 2002 "Mean-field approaches to independent component analysis," [Neural Computation] Vol.14 P.889-918 google
  • 2.
  • 3. Cox I. J., Miller M. L., Bloom J. A. 2001 Digital Watermaking google
  • 4. Hyvarinen A., Karhunen J., Oja E. 2001 Independent Component Analysis google
  • 5. De Lathauwer L., De Moor B., Vandewalle J. 2000 "An algebraic ICA algorithm for 3 sources and 2 sensors," [Proceedings of the European Signal Processing Conference] P.461-465 google
  • 6. Malik H., Khokhar A., Ansari R. 2005 "Improved watermark detector for spread-spectrum based watermarking using independent component analysis," [Proceedings of the 5th ACM Workshop on Digital Rights Management] P.102-111 google
  • 7. Atal B. S., hanaver S. L. 1971 "Speech analysis and synthesis by linear prediction of the speech wave," [Journal of the Acoustical Society of America] Vol.50 P.637-655 google
  • 8. 1993 Coding of Moving Pictures and Associated Audio for Digital Storage up to about 1.5 Mbits/s google
  • [Fig. 1.] Semantic diagram of basic additive embedding and correlation-based detection framework.
    Semantic diagram of basic additive embedding and correlation-based detection framework.
  • [Fig. 2.] Empirical probability density functions before and after linear prediction filtering.
    Empirical probability density functions before and after linear prediction filtering.
  • [Fig. 3.] Block diagram of the proposed watermark detection procedure. LPC: linear predictive coding, ICA: independent component analysis.
    Block diagram of the proposed watermark detection procedure. LPC: linear predictive coding, ICA: independent component analysis.
  • [Fig. 4.] Watermark embedding procedure. FFT: fast Fourier transform, IFFT: inverse FFT.
    Watermark embedding procedure. FFT: fast Fourier transform, IFFT: inverse FFT.
  • [Fig. 5.] Robustness performance: no Attack (top-left), additive white Gaussian noise attack (5% noise power, top-right), MP3 compression attack (128 kbps, bottom-left), and bandpass filtering attack (bottom-right). ICA: independent component analysis, LPC: linear predictive coding.
    Robustness performance: no Attack (top-left), additive white Gaussian noise attack (5% noise power, top-right), MP3 compression attack (128 kbps, bottom-left), and bandpass filtering attack (bottom-right). ICA: independent component analysis, LPC: linear predictive coding.
  • [Table 1.] Correlation values depending on the detection method using the sound quality assessment material [2] database for magnetic propertie
    Correlation values depending on the detection method using the sound quality assessment material [2] database for magnetic propertie