In H.264/AVC, the first frame of a group of pictures (GOP) is encoded in intra mode which generates a large number of bits. The number of bits for the I-frame affects the qualities of the following frames of a GOP since they are encoded using the bits remaining among the bits allocated to the GOP. In addition, the first frame is used for the inter mode encoding of the following frames. Thus, the initial quantization parameter (QP) affects the following frames as well as the first frame. In this paper, an adaptive peak signal to noise ratio (PSNR)-based initial QP determination algorithm is presented. In the proposed algorithm, a novel linear model is established based on the observation of the relation between the initial QPs and PSNRs of frames. Using the linear model and PSNR results of the encoded GOPs, the proposed algorithm accurately estimates the optimal initial QP which maximizes the PSNR of the current GOP. It is shown by experimental results that the proposed algorithm predicts the optimal initial QP accurately and thus achieves better PSNR performance than that of the existing algorithm.
Recently, the H.264/AVC standard, which was jointly developed by International Telecommunication Union (ITU) and Moving Picture Experts Group (MPEG), has been widely used in many applications for video coding. H.264/AVC outperforms previous coding standards and has many outstanding features, such as various intra/inter prediction modes, multiple reference frames, rate-distortion optimization, and variable block sizes [1]. However, the H.264/AVC standard does not take into consideration the issue of maintaining a constant bit rate (CBR) through the network channel. Hence, it is necessary to implement a rate control algorithm in the video encoder in order to transmit the coded video sequence without any abrupt variations of the bitrate over time under conditions of limited channel bandwidth [2].
Usually, rate control aims to achieve good perceptual quality given the transmission bit rate constraint. That is, rate control regulates the amount of the coded bits by adjusting the quantization parameter (QP) while maximizing the video presentation quality. To achieve this, the ratequantization (R-Q) model is often employed for representing the coded bits by means of QP and other parameters such as the mean absolute difference (MAD) of a residual MB and the percentage of zero quantized coefficients [3]. Unfortunately, using parameters such as MAD for R-Q modeling causes the chicken-and-egg dilemma because the Lagrangian method employed in H.264 needs to be available before mode decision but until the end of mode decision, rate control (RC) cannot access the statistics such as MAD for determining QP [4]. Li et al. [3] in JVT-G012 have proposed an adaptive rate control framework for H.264/AVC, where a single-pass rate control method based on the quadratic R-Q model is used and a linear model for MAD prediction is employed to solve the above dilemma.
Recently, many rate control algorithms have been proposed for H.264/AVC to improve JVT-G012, but most of them only focus on P-frame coding. However, how to encode the I-frame of a group of pictures (GOP) is also a very important factor influencing the RC performance. Usually, the I-frame and the first P-frame of a GOP are encoded using the predetermined QP, which is called the initial QP. In many RC algorithms, the initial QP of the first GOP is determined only depending on the bits per pixel (BPP) as JVT-G012 does. From the second GOP, the initial QP for an I-frame depends on the average QP of the P-frames in the previous GOP. The potential problem of this scheme is that given a bit budget when encoding the current I-frame, it is difficult to accurately estimate the QP since the characteristics of the current GOP are not considered [4-6]. However, it is quite important to control the quality of the I-frame to a suitable level for a fixed target output bit rate. A high-quality I-frame usually consumes more bits of the bits allocated to a GOP, which degrades the video quality of the P- and B-frames in the same GOP due to frame skip and buffer overflow. On the other hand, a low-quality I-frame certainly degrades the video quality because the I-frame is used for encoding Pand B-frames. Usually, given the same BPP, a large initial QP is desired for video sequences with complex spatial details or high motion types, whereas, for video sequences with simple spatial contents or low motions, a small initial QP will be advantageous. Thus, the initial QP should be determined by considering BPP as well as the contents of the video sequence [4].
In this paper, an adaptive peak signal to noise ratio (PSNR)-based initial QP determination algorithm is proposed. By considering the characteristics of the contents, the proposed algorithm is capable of accurately estimating the initial QP for a GOP compared with the conventional methods. Experimental results show that the proposed algorithm outperforms the existing method for H.264/AVC rate control.
The rest of this paper is organized as follows. Section II presents the existing rate control algorithm for the initial QP in H.264 reference software. The development of the proposed method of the adaptive initial QP determination is discussed in Section III. Section IV demonstrates the experimental results for performance comparison. Finally, a conclusion is drawn in Section V.
A rate control framework for H.264/AVC has been proposed in JVT-G012 [3] and recently modified in JVTW057 [7]. The algorithm is used to create the stream satisfying the available bandwidth provided by a channel and is also compliant with a hypothetical reference decoder (HRD). It consists of three tightly consecutive components: the GOP level rate control, the frame level rate control, and the basic unit level rate control. Among them, the GOP level rate control includes the calculation of the total number of bits for a GOP and the determination of the initial QP for the GOP. This paper focuses on the determination of the initial QP of the GOP level rate control.
An initial QP
For the other GOPs, the initial QP’s are calculated as follows:
where
where
Fig. 1 shows PSNR results of the QCIF Akiyo sequence when the GOP size is 30, the frame rate is 30 fps, and the bit rate is 60 kbps. The JVT algorithm determines the first initial QP according to Eq. (1), so the first initial QP is set to 40. For comparison, PSNR results are added when the first initial QP is 20. In the case of the Akiyo sequence, the first initial QP of 40 is too big, so the quality of the I-frame is not good. The bad quality of the I-frame of the first GOP degrades the qualities of the following GOP’s as well as that of the first GOP. On the other hand, when the first initial QP is 20, the quality of the I-frame is much higher than that of the previous case and the overall qualities of the GOP’s are also better than those of the JVT algorithm. Fig. 2 shows the average QP of each frame of the sequence. From the second GOP, the initial QP is calculated by Eqs. (2) and (3), so the maximum difference between the two successive GOP’s is 2. It is shown that the initial QP’s vary gradually in the range of -2 to 2. Therefore, if the first initial QP is set to be too big or too small, the quality degradation is propagated to the following QOP’s.
The selection of QP based on Eqs. (1), (2), and (3) has been adopted for implementation of the H.264/AVC reference model. However, in order to enhance the H.264 overall performance, a more efficient rate control scheme is needed. The details of the proposed rate control scheme, which improves the existing method, are described in the next section.
This paper focuses on the determination of the initial QP of the GOP level rate control. In addition, rate control for real-time application is considered, so it is assumed that the frame structure is “IPPP…” without the B frame.
In the JVT rate control scheme, the QP for an I-frame depends on the average QP of the P-frames in the previous GOP as shown in Eq. (2). This initialization scheme is simple and adaptive to the available channel bandwidth, but the initial QP converges to the optimal value very slowly. Also, it does not consider the characteristics of each video sequence. A more efficient rate control scheme has to find the optimal value more quickly. In addition, it has to take into consideration the properties of each video sequence, such as the frame complexity and motion characteristics. However, the algorithm becomes more complex as the number of parameters is increased, and a complicated algorithm cannot be used for real-time applications. The proposed algorithm uses only PSNR properties of a GOP, so it is simple and can be used in realtime applications.
Various test sequences have been encoded using different initial QP’s in H.264/AVC, and PSNR characteristics of GOP’s have been studied. As the initial QP decreases, the PSNR of the I-frame improves but that of the P-frame is degraded. This is because the I-frame consumes so many bits that there are not enough bits left for P-frames, which are encoded using the remaining bits. Let
where
The PSNR of a GOP varies with the change in the initial QP. When the initial QP has a small value, the entire PSNR of a GOP has a low value because of frame skip and buffer overflow. As the value increases, the entire PSNR also increases. Let
Let
Using the modified linear model, the proposed scheme determines the initial QP of the
In the proposed scheme, the first GOP of a sequence is encoded by the existing method, and from the second GOP, the initial QP’s are determined by Eq. (6). However, there are two parameters
where
In Eq. (6),
The characteristics of
Performance comparisons of the GOP level rate control algorithms in JVT-W057 and the proposed rate control algorithm in terms of average PSNR when the bit rates are 60, 80, and 100 kbps
92% of the I-frame PSNR. For simplicity,
Numerous experiments have been conducted to evaluate the performance of the proposed rate control algorithm, which has been implemented with the latest version of the JVT reference software, JM18.3 using a baseline profile. The results achieved here are compared with those achieved using the JVT-W057 rate control algorithm adopted by JM18.3.
The same encoding parameters are used for both algorithms in order to ensure that the comparison is fair. For the experiments, the following test conditions are used: an “IPPPP…” GOP structure with a GOP size of 30 is used, the motion vector search range and the number of multiple reference frames for motion estimation are set to 16 and 2, respectively, and fast full search motion estimation and ratedistortion optimization are enabled. The simulation was conducted with the first 180 frames of three QCIF test sequences of Akiyo, Carphone, and Foreman. In order to ensure the equivalence of the rate control parameters, the sizes of the basic units for the basic unit-level rate control are fixed at 1 macroblock.
Since the major issue for video coding is the quality of the video at the given target bit rate, the average PSNR value of each QOP is calculated and listed in Table 1 in order to provide an objective evaluation of the video quality. The proposed scheme uses the JVT algorithm for the first GOP, so the PSNR results of the first GOP’s are not included in Table 1, where IQP denotes the initial QP. The proposed scheme shows better video quality than the rate control of the JVT algorithm in terms of the average PSNR values.
The frame-to-frame PSNR results of three sequences are shown in Fig. 5, where it is shown that better results are obtained by the proposed scheme than the JVT algorithm. Under the conditions of these simulations, the initial QP of the first GOP is set to 40 by Eq. (1), but this value is bigger than the optimal value, which maximizes the average PSNR value of a GOP. In the JVT algorithm, the initial QP value decreases by 2, so it takes several GOP’s to reach the optimal value. On the other hand, it is shown that the proposed scheme can find the optimal value more quickly than the JVT algorithm.
The proposed scheme can also be applied to the scene change situation because the initial QP calculation by Eq. (1) is used for the first GOP after the scene change as well as the first GOP of a sequence. After scene changes, the proposed scheme can improve the visual qualities by finding the optimal initial QP more quickly.
In this paper, an adaptive PSNR-based initial QP determination algorithm for H.264/AVC is proposed. The proposed algorithm takes the characteristics of each video sequence into consideration by using the linear relation between the initial QP and the PSNR ratio, so it can precisely estimate the optimal initial QP compared with the existing method. Experimental results show that the proposed scheme achieves better video quality than that of JVT-W057. In case of the Akiyo sequence, the proposed algorithm improves the average PSNR of GOPs up to about 2 dB.