Multiple Description Coding Using Directional Discrete Cosine Transform

  • cc icon
  • ABSTRACT

    Delivery of high quality video over a wide area network with large number of users poses great challenges for the video communication system. To ensure video quality, multiple descriptions have recently attracted various attention as a way of encoding and visual information delivery over wireless network. We propose a new efficient multiple description coding (MDC) technique. Quincunx lattice sub-sampling is used for generating multiple descriptions of an image. In this paper, we propose the application of a directional discrete cosine transform (DCT) to a sub-sampled quincunx lattice to create an MDC representation. On the decoder side, the image is decoded from the received side information. If all the descriptions arrive successfully, the image is reconstructed by combining the descriptions. However, if only one side description is received, decoding is executed using an interpolation process. The experimental results show that such the directional DCT can achieve a better coding gain as well as energy packing efficiency than the conventional DCT with re-alignment.


  • KEYWORD

    Directional discrete cosine transform , Image coding , Multiple description coding

  • I. INTRODUCTION

    Due to network congestion and delay sensibility, video transmission over a lossy network is always a great challenge. Multiple description coding (MDC) [1] is an attractive approach to solving this problem as shown in Fig. 1. It can efficiently combat packet loss without any retrainsmission, thus satisfying the demand of real-time services and relieving network congestion.

    MDC encodes the source message into several bit streams (descriptions) carrying different information, which can then be transmitted over multiple channels [2]. In MDC’s simplest form, two parallel channels are assumed to connect the source with the destination. If only one channel works, the descriptions can individually be decoded to sufficiently guarantee a minimum fidelity in the reconstruction at the receiver [3]. However, when both channels work, the descriptions from the channels can be combined to yield a relatively high fidelity reconstruction.

    Numerous MDC techniques have been proposed in recent years, such as the multiple description scalar quantization (MDSQ) proposed in [2]. In MDSQ, two descriptions are created by two coarse quantizers, each ensuring an aceptable distortion when only one of them is received.

    These two coarse quantizers can be combined to produce a finer quantizer if two descriptions are received. Further, various types of coding techniques such as subband coding and wavelet coding have also implemented MDC [4-7].

    In this paper, we re-visit the MDC scheme based on the pixel domain sub-sampling. In particular, we focus on the quincunx sub-sampling lattice. Instead of applying a horizontal or vertical realignment so as to form regular

    square blocks, we retain the quincunx lattice and apply the directional discrete cosine transform (DDCT). Both theoretical analysis and simulation test will be discussed to confirm that an improved coding efficiency can be achieved in our DDCT, as compared to the traditional DCT with horizontal or vertical re-alignment.

    In Section II, we briefly introduce the traditional pixeldomain sub-lattice on MDC. The proposed directionally sampled discrete cosine transform (DS-DCT) for the quincunx sub sampling lattice is presented in Section III. Further, we explain how to handle some boundary blocks that remain after the DS-DCT. In Section IV, we describe the experimental setup and present some simulation results. Finally, some conclusions are presented in Section V.

    II. SUB-SAMPLING ON MDC

    In this section, we will discuss the sub-lattice technique used in the proposed method. Given the source image I, which is typically a subset of Z2, in the proposed method, signal samples are partitioned into two subsets as follows:

    image

    There are two different methods to partition the image into two parts. Fig. 2(a) shows orthogonal sub-sampling, and Fig. 2(b) illustrates quincunx sub-sampling. In the proposed method, descriptions generated by scheme two are used. One of the major advantages of this scheme is the increase in correlation between samples. Under this scheme, two descriptions are generated according to a chess-box pattern, and the Euclidian distance between two neighboring samples is constantly equal to √2. After the splitting process, each description is transformed to the transform domain.

    III. DIRECTIONAL COSINE TRANSFORM

    The DCT and the discrete wavelet transform used in image compression are implemented by separable onedimensional (1D) transforms in the rows and columns of images.

    The conventional N × N 2D DCT is implemented separately by two N-point 1D transforms. Let B(i, j)N × N and CN × N be the image block and the transform matrix. Then, the corresponding block of transformed coefficients B(u,v) can be expressed as follows:

    image

    where

    image

    Naturally, the conventional 2D DCT seems to be the best choice for image blocks in which vertical and/or horizontal edges dominate. However, it may cause some defects when it is applied to an image block in which other directional edges dominate. The major shortcoming of the separable transform is that it cannot represent the anisotropic edges in the image sparsely. In order to obtain the better representation of edges in all directions, the given image block is transformed on the basis of the directional DCT in Fig. 3.

    In the proposed method, there are in total five directional modes. Among these modes, one is the vertical prediction (mode 0), and the remaining are labeled diagonal down-right (mode 1), diagonal down-left (mode 2), vertical-right (mode 3), and horizontal-down (mode 4), as shown in Fig. 4(a), (b), (c), (d), and (e), respectively.

    On the encoder side, the input image is first analyzed block-by-block to decide the transform directions. The 1D DCT transform is performed in each block of the selected direction along the vertical direction. Next, the horizontal DCT is applied in the second step.

    On the decoder side, when only one side description is received, the main task of the decoder is to interpolate the missing sub-image from the received sub-image. Although the proposed method involves the use of the directional data, all pixels in the partitioned blocks share a common direction

    and the lost description is estimated from the four connected neighbors by using the conventional bilinear interpolation method. Since there are numerous interpolation algorithms for preserving the original textue contents of an image, we can enhance the quality of the reconstructed image by selecting any of the appropriate interpolation schemes.

    Similarly, when two descriptions are simultaneously available at the decoder, a straightforward method is to decode the two descriptions simultaneously and then merge the two sub-images. Since each side description is compressed using quantization, any decoded pixel value from one description is only an approximation of the original.

    IV. EXPERIMENTAL RESULTS

    Several experiments were conducted, and their results are presented in this section in order to evaluate the performance of the proposed image MDC scheme. The implementation of the new MDC scheme is integrated into JPEG coding, with the directional transform replacing the original 2D separable rectilinear discrete cosine transform.

    Naturally, the JPEG MDC scheme is used as a benchmark for performance comparisons in terms of the peak signal-to-noise ratio (PSNR), where the input image is first split into two descriptions by quincunx lattice sub-sampling and then coded individually by JPEG coding.

    To make a fair comparison, we also adopt the proposed texture-oriented interpolation and data fusion algorithms for the central decoding of JPEG MDC.

    Two JPEG test images (Lena and Barbara) having a resolution of 512 × 512 are used in our experiments. They are split into two descriptions, each of which is a quincunx lattice. Each description is compressed using JPEG coding and the proposed scheme.

    We evaluate the PSNR performance of a side decoder when only one description is received. As proposed, the full-resolution images are reconstructed from the received side description by the texture orientation interpolation method. For the sake of comparison, we also compute the PSNR results of the widespread linear interpolation method when applied to the received quincunx image, as shown in Fig. 5. Since the two descripttions are balanced in our experiments, it suffices to list the PSNR values for description 1 of the proposed MDC scheme. The PSNR values shown in this figure are calculated over all samples including both the decoded and the interpolated images. The rates are also calculated over all samples in terms of a fullresolution image. One can observe that at low rates (e.g., 0.125 bpp), the two interpolation methods perform roughly the same, with only a small advantage to the texture orientation method. This is due to the lack of highfrequency components in the received side description at a low rate.

    V. CONCLUSIONS

    In this paper, we proposed a new MDC scheme using the directional DCT transform. The input image was directly split into two descriptions in the pixel domain using quincunx lattice sub-sampling. Using DDCT, we represented the image pixels oriented in different directions perfectly. The experimental results confirmed that the proposed directional MDC scheme could outperform the JPEG MDC scheme by up to 0.9 in the cases of both side decoding and central decoding.

    Now, let us turn to channel modeling. Due to the highly frequency-selective nature of underwater channels, multicarrier modulation (e.g., orthogonal frequency-division multiplexing) is an attractive choice for reduction in receiver complexity. For analytical convenience, coding is assumed to be performed over a subchannel in a slot experiencing relatively flat fading (through channel coding across all the subchannels, full frequency diversity can be utilized, resulting in a better outage performance, which remains for further work). In this work, we focus on a subcarrier under the assumption that the same relay technique is applied to every subcarrier.

    As stated earlier, suppose that the processing delay, taking place due to a variety of operations (e.g., receiving and reading a packet), at the relay is negligible as compared to the propagation delay in water (the propagation speed of an acoustic signal in water is around 1,500 m/s [13], which is five orders of magnitude lower than that of a radiowave). This is because the processing delay is at most on the order of a few milliseconds, while the propagation delay can be of several seconds according to the distance between nodes. Such an assumption was similarly made in [14] only when the AF relay was used in the underwater system even if the AF protocol could not utilize the full spatial diversity, which will be specified in Section III-A. In this model, the symbol generated at R is immediately forwarded to D, instead of waiting until the next time slot. That is, no idle time is assumed at R. Then, when the relative propagation delay between the direct and the relay paths is only a multiple of the basic symbol duration (far less than the length of each slot) under our network topology, the signal sent from S and the signal forwarded by R can be regarded as two paths in the frequency domain at a certain time by allowing a sufficiently long guard interval between the symbols. That is, synchronous cooperative communications can be possi-ble owing to the use of multi-carrier modulation (refer to [15] for the detailed description). Thus, unlike in the case of a wireless radio [5,16], no additional time slot is required for cooperative transmission.

    When the two instantaneous full-duplex relay schemes are used at a certain subcarrier (symbol), the output signals at the relay R and the destination D are given by

    and

    where yR and yD denote the signals received at R and D, respectively, xS and xR represent the transmitted symbols from S and R, respectively, and zR and zD refer to the independent and the identically distributed (i.i.d.) additive white Gaussian noises with variance N0. Here, hRS, hRD, and hDS denote the i.i.d. channel coefficients of the S-R, R-D, and S-D links, respectively, where all of them follow CN(0,1), i.e., Rayleigh fading (Note that Rician fading provides a good match for underwater acoustic channels [17]. However, since the high SNR outage behaviors of Rayleigh and Rician channels are shown to be identical [18], we simply consider Rayleigh fading in this work). Moreover, we assume the quasi-static channel model, in which the channel coefficients are constant over time during one block transmission and change to a new independent value for the next block. The CSI is assumed to be available at the receivers, but not at the transmitters

    For the AF transmission, the transmitted symbol at R is given by

    where g represents the amplification factor and is given by [5]

    For DF transmission, the relay processes yR by de-coding an estimate of the symbol transmitted from S. The relay codebook is assumed to be independent of the source codebook. The relay R transmits the encoded symbol if it decodes the received signal successfully, i.e., the effective SNR |hRS|2/N0 at R exceeds a predetermined threshold. Otherwise, xR is set to 0, i.e., no transmission at R.

    III. DMT ANALYSIS

    In this section, the DMT curves for three-node under-water acoustic systems using the AF and DF protocols are analyzed after briefly reviewing DMT [10].

    A. Overview of DMT

    image

    Let r and d denote the multiplexing and diversity gains, respectively. Then,

    and

    image
  • 1. Vaishampayan V. A. 1993 “Design of multiple description scalar quantizers” [IEEE Transactions on Information Theory] Vol.39 P.821-834 google doi
  • 2. Goyal V. K. 2001 “Multiple description coding: compression meets the network” [IEEE Signal Processing Magazine] Vol.18 P.74-93 google doi
  • 3. Vaishampayan V. A., Sloane N. J. A., Servetto S. D. 2001 “Multiple description vector quantization with lattice codebook: design and analysis” [IEEE Transactions on Information Theory] Vol.47 P.1718-1734 google doi
  • 4. Franchi N., Fumagalli M., Lancini R., Tubaro S. 2005 “Multiple description video coding for scalable and robust transmission over IP” [IEEE Transactions on Circuits Systems for Video Technology] Vol.15 P.321-334 google doi
  • 5. Servetto S. D., Ramchandran K., Vaishampayan V. A., Nahrstedt K. 2000 “Multiple description wavelet based image coding” [IEEE Transactions on Image Processing] Vol.9 P.813-826 google doi
  • 6. Channappayya S. S., Lee J., Heath R. W., Bovik A. C. 2005 “Frame based multiple description image coding in the wavelet domain” [in Proceedings of the IEEE International Conference on Image Processing] P.920-923 google
  • 7. Kauff P., Schuur K. 1998 “Shape-adaptive DCT with blockbased DC separation and ΔDC correction” [IEEE Transactions on Circuits and Systems for Video Technology] Vol.8 P.237-242 google doi
  • [Fig. 1] Multiple description (MD) coding with two descriptions and three decoders.
    Multiple description (MD) coding with two descriptions and three
decoders.
  • [Fig. 2.] Two different pixel-domain sub-sampling lattices for multiple description coding: (a) orthogonal sub-sampling and (b) quincunx subsampling.
    Two different pixel-domain sub-sampling lattices for multiple
description coding: (a) orthogonal sub-sampling and (b) quincunx subsampling.
  • [Fig. 3.] Exemplified elementary matrix operation: (a) no directional and (b) directional. The circles denote pixels, and the squares represent halfpixels.
    Exemplified elementary matrix operation: (a) no directional and
(b) directional. The circles denote pixels, and the squares represent halfpixels.
  • [Fig. 4.] Five direction modes: (a) vertical prediction, (b) diagonal down-right, (c) diagonal down-left, (d) vertical-right, and (e) horizontal-down. The circles denote pixels, and the dashed lines represent direction lines.
    Five direction modes: (a) vertical prediction, (b) diagonal down-right,
(c) diagonal down-left, (d) vertical-right, and (e) horizontal-down. The
circles denote pixels, and the dashed lines represent direction lines.
  • [Fig. 5.] Experimental results. The horizontal axis is bit per pixel and the vertical axis is peak signal-to-noise ratio (PSNR). (a) PSNR of the received interpolated image with side decoder1 of Lena image. (b) PSNR of the interpolated received image with side decoder1 with Barbara image. (c) PSNR of the received interpolated image with side decoder1 with Boat image. MDC: multiple description coding.
    Experimental results. The horizontal axis is bit per pixel and the
vertical axis is peak signal-to-noise ratio (PSNR). (a) PSNR of the
received interpolated image with side decoder1 of Lena image. (b) PSNR
of the interpolated received image with side decoder1 with Barbara image.
(c) PSNR of the received interpolated image with side decoder1 with Boat
image. MDC: multiple description coding.