Fast Motion Estimation Based on a Modified Median Operation for Efficient Video Compression

Kim Jongho

doi:10.6109/jicce.2014.12.1.053

OA학술지
Journal of information and communication convergence engineering

Fast Motion Estimation Based on a Modified Median Operation for Efficient Video Compression

DOI : 10.6109/jicce.2014.12.1.053
Author: Kim Jongho
Organization: Kim Jongho
Publish: Journal of information and communication convergence engineering Volume 12, Issue1, p53~59, 31 March 2014

ABSTRACT

Motion estimation is a core part of most video compression systems since it directly affects the output video quality and the encoding time. The full search (FS) technique gives the highest visual quality but has the problem of a significant computational load. To solve this problem, we present in this paper a modified median (MMED) operation and advanced search strategies for fast motion estimation. The proposed MMED operation includes a temporally co-located motion vector (MV) to select an appropriate initial candidate. Moreover, we introduce a search procedure that reduces the number of thresholds and simplifies the early termination conditions for the determination of a final MV. The experimental results show that the proposed approach achieves substantial speedup compared with the conventional methods including the motion vector field adaptive search technique (MVFAST) and predictive MVFAST (PMVFAST). The proposed algorithm also improves the PSNR values by increasing the correlation between the MVs, compared with the FS method.

KEYWORD

Diamond Search , Fast motion estimation , Median predictor , MVFAST , PMVFAST

본문

Collapse all

I. INTRODUCTION

The block matching algorithm (BMA) is an essential part of many video coding standards, such as MPEG-1, 2, 4, and H.26x, to exploit the temporal correlation and reduce the redundancy that exists between frames of video sequences [1, 2]. In BMA, the current frame is partitioned into nonoverlapped blocks, and the best match for these blocks is found inside the search range of a reference frame with a predefined distortion criterion. The best matching block is then used as a predictor for the block in the current frame, where the displacement between the two blocks defines a motion vector (MV) of the current block. Thus, the encoder sends the MV and a residual block, which is the difference between the current block and the predictor, instead of the current block itself. This can significantly reduce the number of bits involved in video coding procedures. The most common distortion measure is the mean absolute error or mean absolute difference, or equivalently, the sum of the absolute difference (SAD), which requires no multiplication and exhibits a performance similar to that of the mean squared error [3, 4]. Let f_t and f_t-1 be a current frame and a reference frame, respectively. Then, for an M × N block located at (x, y) in the current frame, the SAD is given by

[]

where d_x and d_y denote the displacements in the x- and y-direction, respectively.

The full search (FS) technique selects the MV, which points to the best matching block that has a minimum SAD, by exhaustive search at all possible points within the search range. Since the calculation of SAD at all possible locations is computationally intensive, it is difficult to directly use FS for various applications, such as digital TV, Internet streaming, and mobile video, in spite of its simple implementation. For this reason, many fast search algorithms have been suggested in an effort to reduce the computational complexity by searching only a subset of the possible search points, such as two-dimensional (2D) log search [5], three-step search (TSS) [6], new three-step search (NTSS) [7], and diamond search (DS) strategies [8, 9]. However, these methods usually incur a significant loss in the PSNR, although they can make the search faster by calculating the SAD less frequently. Moreover, these algorithms are not exactly optimal in consideration of the bits required to encode each MV, since they find MVs just by minimizing the SAD irrespective of the distribution of MVs.

In particular, the motion vector field adaptive search technique (MVFAST) and the predictive MVFAST (PMVFAST) have been introduced in order to improve the DS technique without sacrificing PSNR [10-12]. The MVFAST considers the MVs of the spatially adjacent blocks and the PMVFAST includes the MVs of the spatially and temporally adjacent blocks as possible candidates. The priority of the initial search point in PMVFAST is given to results of the median operation for the candidates, whereas priority in DS is given to a (0, 0) motion vector, MV₀. In addition, the early termination by a thresholding technique in the PMVFAST further reduces the computational complexity of the search procedures. These techniques of PMVFAST allow the video encoders to generate fewer bits for MVs with a negligible PSNR loss as well as significantly reduce the computations for search procedures, since the relatively high correlation between the MVs can be used effectively.

In this paper, we propose a modified median (MMED) operation and effective early termination conditions for the fast motion estimation. Since the MV of a temporally colocated block is more correlated with the final MV in many cases, our approach uses the MVs of four adjacent blocks, which include the left, top, and top-right blocks of the current block and a temporally co-located block in a reference frame, whereas PMVFAST uses the MVs of three spatially adjacent blocks. To select an appropriate candidate in the initial search procedure, we introduce an MMED operator that can decrease the number of search points, and this reduces the overall computational load. Since the initial search point is chosen more appropriately, thresholds and early termination conditions can be simplified, and consequently, we achieve faster motion estimation without sacrificing PSNR compared with the case of PMVFAST. The rest of this paper is organized as follows: Section II presents a review of MVFAST and PMVFAST. In Section III, the proposed fast motion estimation technique is discussed in detail. Experimental results and conclusions are given in Sections IV and V, respectively.

II. REVIEW OF MVFAST AND PMVFAST

Most of the fast motion estimation algorithms first choose several candidates and then, calculate SAD at the candidate points. After the selection of the candidate with the minimum SAD, more search points around the selected candidate are tested. If the best candidate is not found in this procedure, the current candidate becomes the final MV. Therefore, the efficiency of the fast search algorithms depends on the determination of the candidates. In TSS, the number of search points does not change significantly according to the number of moving objects in the video sequences. On the other hand, in the DS that is biased to MV₀, better performance can be observed when the video sequences contain a few moving objects. MVFAST and PMVFAST also use DS patterns for fast motion estimation that is adaptive to the characteristics of video sequences. We will now briefly describe MVFAST and PMVFAST, which improve the search speed significantly with a negligible PSNR loss, since the proposed method is based on them.

  > A. Search Procedures of MVFAST

MVFAST introduced in [10, 12] improves the speed-up of the DS algorithm significantly by initially considering a small set of candidates, which includes MV0 and the MVs of three spatially adjacent blocks (the left, top, and top-right blocks). Unlike the DS that uses only the large diamond search pattern (LDSP) shown in Fig. 1(a), MVFAST considers the small diamond search pattern (SDSP) shown in Fig. 1(b).

[Fig. 1.] Diamond search patterns: (a) large diamond search pattern (LDSP) and (b) small diamond search pattern (SDSP).

Both the diamond patterns (LDSP and SDSP) keep on moving until the minimum SAD is found at the center. If the center point is selected using SDSP, the search procedure is terminated at this point. If the center point is selected using LDSP, the search procedure is carried out again using SDSP around the center of LDSP.

In MVFAST, the selection between SDSP and LDSP depends on the motion activity that the current block will most likely have. This motion activity is characterized by the magnitude of the largest MV among the three adjacent MVs, i.e., MV_left, MV_top, and MV_top-right, as shown in Fig. 2.

[Fig. 2.] Motion vectors of spatially adjacent blocks. MV: motion vector.

The motion activity is described as low when it is below a threshold L₁, medium when it is between L₁ and a second threshold L₂, and high when it is larger than L₂. If the motion activity is considered low, SDSP is applied with MV₀ as the center. If the motion activity is medium, MV₀ is selected again as the center and LDSP is applied at this time instead of SDSP. This is the only case in which LDSP is selected, whereas the DS algorithm always chooses LDSP. After testing with LDSP, we apply SDSP again at the point with the smallest SAD as the center. Finally, if the motion activity is characterized as high, the center is determined with the position of the MV that yields the minimum SAD among MV₀ and the spatially adjacent MVs, and then, SDSP is applied at this point. Moreover, in order to increase the search speed, MVFAST can introduce an early termination step, which is performed initially in the algorithm, with a fixed threshold (e.g., T = 512) at the MV₀ position. When the SAD at the MV₀ position is lower than T, MVFAST is stopped without examining any other locations.

  > B. Search Procedures of PMVFAST

Unlike MVFAST whose search priority is given to MV_0, the priority of PMVFAST is given to the median of three spatially adjacent MVs [11, 12]. If SAD at the point obtained by the median operation is sufficiently small (e.g., T = 256), the search is terminated early after examining just one point. Otherwise, another termination condition is considered as follows: If the MV selected by the median operation is equal to the MV of a temporally co-located block (MV_t-1) and SAD at this point is smaller than SAD_t-1, which is SAD between the block in f_t-1 and the block pointed by MV_t-1 in f_t-2, the search is stopped. When the algorithm is not terminated after checking the median operation, the final MV is searched by examining the five candidates, MV₀, MV_left, MV_top, MV_top-right, and MV_t-1. After the selection of the best candidate from among them, a threshold T₁ is applied; that is, the search is stopped if SAD calculated by the best candidate is smaller than T₁, which is selected adaptively as the smallest SAD among the SADs of the left, top, and top-right blocks. Otherwise, the best candidate is used as the center of the DS pattern. Furthermore, if the best candidate and MV_t-1 are identical and SAD at the best candidate is smaller than SAD_t-1, the search is terminated.

In PMVFAST, LDSP is selected less frequently to reduce the total number of search points. To determine which DS pattern should be used, another threshold T₂, defined as T₁ + 256 is applied. LDSP is chosen when T₂ is larger than 1536 and the result of the median operation is MV₀, whereas SDSP is used in all other cases. When the smallest SAD is obtained at the center of the pattern, the algorithm is terminated. Additionally, PMVFAST has another early termination condition as follows: If all of the MVs of three spatially adjacent blocks are identical and MV_t-1 is equal to the result of the median operation, the candidates (three spatially adjacent MVs and MV₀) are highly correlated with each other. In this case, the selected search pattern is applied just once. Fig. 3 shows an example of searching the best matching block where the square shapes represent the points that have the smallest SAD in each step.

[Fig. 3.] An example of the search procedure for the best matching block in predictive motion vector field adaptive search technique. MV: motion vector.

III. PROPOSED MOTION ESTIMATION

We introduce a MMED operation to predict the appropriate starting search point, which significantly affects the performance of a fast motion estimation technique. Furthermore, the proposed algorithm reduces the number of thresholds and early termination conditions, and it consequently prevents the degradation in PSNR.

  > A. Modified Median Operation

PMVFAST feeds three spatially adjacent MVs to the median operator in order to determine the initial search point. However, our investigation presents that a temporally co-located MV, MV_t-1 shown in Fig. 4, can be a good candidate for the starting search point, since it is highly correlated with the final MV

[Fig. 4.] Temporally co-located motion vector, where ft and ft-1 represent the current frame and the reference frame, respectively. MV: motion vector.

Table 1 describes the average distances in pixel values between each candidate MV and the final MV as obtained by using PMVFAST. We can infer from the table that MV_t-1 is more highly correlated to the final MV than the others in most of the test sequences. Therefore, we include MV_t-1 in the median operation to determine the starting search point. Let (p_x, p_y) be a starting search point; then, the MMED operation is given by

[Table 1.] Average distances in pixels between each candidate and the final MV in the case of PMVFAST

Average distances in pixels between each candidate and the final MV in the case of PMVFAST

[]

where i ∈ {left, top, top-right, t-1}, i.e., denotes the x-component of the starting search point selected among the MVs of the left, top, top-right, and temporally co-located blocks.

The proposed MMED operation can deal with evennumbered elements, whereas the conventional median operation used in PMVFAST can process only odd-numbered elements. A median operation eliminates the uncorrelated values more effectively than an average operation. On the other hand, the proposed MMED operation reduces the prediction error by averaging the spatially and temporally adjacent MVs and excluding the maximum and minimum values among them, which are considered to be less correlated to the final MV.

The starting search point for the current block at the typical position is determined as follows: When the current block is located at the origin (i.e., top-leftmost position) of the frame, MV_t-1 is chosen directly as the predicted MV, since it is the only available candidate MV. When the current block lies in the first row except at the origin of the frame, two candidates, MV_left and MV_t-1, are available. Since the proposed MMED operator cannot deal with two elements, MV₀ is added in this case. When the current block lies in the first column, except at the origin of the frame, or in the rightmost column, three candidates are available (MV_top, MV_top-right, and MV_t-1 for the first column, or MV_left, MV_top, and MV_t-1 for the rightmost column) and the starting search point is selected by the median rule instead of (2).

After the starting search point is determined, the early termination condition is applied. That is, if SAD at the point from (2) is less than T (e.g., 256), the search procedure is terminated. Furthermore, when the point from (2) is equal to MV_t-1 and SAD at this point is smaller than SAD_t-1, the search procedure is also stopped. Table 2, which describes the frequency of the final MV determined after only one point is examined, presents that the MV from the proposed MMED operation is selected as the final MV more frequently than in the case of PMVFAST. This implies that the proposed algorithm can select the best matching block faster with a lower PSNR loss than PMVFAST.

[Table 2.] The number of final MVs after the examination of only one point

The number of final MVs after the examination of only one point

  > B. Determination of the Final Motion Vector

PMVFAST examines five candidates including MV₀ after the selection of the starting search point by the median operation. In fact, as shown in Table 3, MV₀ is rarely selected as the final MV when all the other candidates are not equal to MV₀. This implies that there is at least one MV₀ among the spatially and temporally adjacent MVs when MV₀ is selected as the final MV in most cases. Thus, MV₀ does not need to be explicitly included in the search procedure, since it is possibly included in the spatially and temporally adjacent MVs.

[Table 3.] Frequency and ratio that MV0 is selected when all of the spatially and temporally adjacent MVs are not equal to MV0

Frequency and ratio that MV0 is selected when all of the spatially and temporally adjacent MVs are not equal to MV0

Consequently, the proposed algorithm includes just four candidates of MV_left, MV_top, MV_top-right, and MV_t-1 in the search procedure for the final MV. Since MV₀ is not tested, the total number of search points is reduced and the correlation between MVs can increase. After an examination of these four candidates, the point that has the smallest SAD is selected as the center of SDSP. This further reduces the number of search points, since the proposed algorithm does not use LDSP. When the MVs are more correlated with each other in a video frame, the number of bits needed to encode each MV is reduced, since the video encoders usually encode the difference between the current MV and the adjacent MVs. We apply an early termination condition to the selected MV in order to further increase the correlation between the MVs as follows: A threshold T₁, which is limited between 512 and 1024, is set to the smallest SAD among the three spatially adjacent MVs (MV_left, MV_top, and MV_top-right). If SAD at the selected MV is smaller than T₁, the search procedure is stopped. When the selected MV is equal to MV_t-1 and SAD at this point is smaller than SAD_t-1, the search is also terminated. For the other cases, the proposed search procedures apply SDSP to find the smallest SAD. In this case, the thresholds and early termination conditions are no longer used.

IV. EXPERIMENTAL RESULTS

The performance of the proposed method was evaluated on the basis of [13] by using the MPEG-4 verification model encoder [14]. Test sequences of QCIF (176 × 144), CIF (352 × 288), and SIF (352 × 240) were coded as IPPP; that is, all frames were inter-coded and only the first frame was intra-coded. TM5 was adopted as the rate control algorithm. Thresholds for motion activity in MVFAST, L₁ and L₂ were set to 1 and 2, respectively, and the threshold for early termination in PMVFAST and the proposed method, T was set to 256. The maximum displacements of the search range were set to ±16, ±32, and ±48 pixels, respectively, in both the horizontal and the vertical direction.

Table 4 summarizes the performance of the proposed algorithm compared with MVFAST and PMVFAST in terms of both the PSNR and the speed-up ratio on average. The speed-up ratio is obtained by comparing the number of search points of each method with that of FS. Detailed results of the proposed approach and conventional methods for each experimental condition are described in Table 5.

[Table 4.] Results of the proposed algorithm in terms of average PSNR and speed-up ratio

Results of the proposed algorithm in terms of average PSNR and speed-up ratio

It can be inferred from Tables 4 and 5 that the proposed algorithm significantly reduces the number of search points compared with those of conventional methods and exhibits similar or higher PSNRs than those of the FS. When the correlation between adjacent MVs increases, the video encoder can reduce the number of bits needed to encode each MV, and thus, the extra bits can be assigned for increasing PSNR. Fig. 5 shows the distribution of MV fields for the Foreman sequence from the FS and the proposed method. As shown in Fig. 5, the final MVs of the proposed algorithm are distributed more uniformly than those of the FS.

[Fig. 5.] Distribution of motion vector fields of the Foreman sequence for each search method: (a) full search and (b) proposed algorithm.

The proposed scheme is superior to PMVFAST, which is a widely used technique for fast motion estimation, in terms of both the speed-up ratio and the PSNR, as shown in Table 5. The performance of PMVFAST can be limited, since many thresholds without considering the characteristics of MVs possibly search for inappropriate points. The proposed algorithm searches faster than conventional algorithms and its average PSNRs are higher than those of the FS.

[Table 5.] Experimental results of the proposed algorithm compared with the conventional methods

Experimental results of the proposed algorithm compared with the conventional methods

V. CONCLUSIONS

This paper presents a fast motion estimation method based on a MMED operation and advanced strategies for the selection of the search points. The proposed MMED operation includes the temporally co-located MV for an initial candidate, which is more correlated with the final MV, as observed in our experiments. Moreover, we reduce the number of thresholds and simplify the early termination conditions. With these improvements, the proposed approach achieves a faster search than MVFAST and PMVFAST. The proposed algorithm also improves PSNRs compared with those of the FS, since the MVs obtained from our approach are more correlated with each other and the bit-savings in coding them can be used for improvements in PSNR. Based on the better performance in terms of fast search and PSNR, the proposed algorithm can be employed in many applications, such as digital TV, Internet streaming, and mobile video by simply modifying the existing algorithms.

참고문헌

1. Ghanbari M. 2003 “Principles of video compression,” in Standard Codecs: Image Compression to Advanced Video Coding P.25-61
2. Richardson I. E. 2003 H.264 and MPEG-4 Video Compression
3. Kim J. S., Kim J. G. 2011 “A performance comparison of blockbased matching cost evaluation models for FRUC techniques,” [International Journal of Information and Communication Engineering] Vol.9 P.671-675
4. Sorwar G., Murshed M., Dooley L. 2004 “Fast block-based true motion estimation using distance dependent thresholds,” [Journal of Research and Practice in Information Technology] Vol.36 P.157-169
5. Jain J. R., Jain A. K. 1981 “Displacement measurement and its application in interframe image coding,” [IEEE Transactions on Communications] Vol.29 P.1799-1808
6. Koga T., Hirano K., Iijima Y., Ishiguro T. 1981 “Motioncompensated interframe coding for video conferencing,” [in Proceedings of the National Telecommunications Conference] P.G5.3.1-G5.3.5
7. Li R., Zeng B., Liou M. L. 1994 “A new three-step search algorithm for block motion estimation,” [IEEE Transactions on Circuits and Systems for Video Technology] Vol.4 P.438-442
8. Zhu S., Ma K. K. 2000 “A new diamond search algorithm for fast block matching motion estimation,” [IEEE Transactions on Image Processing] Vol.9 P.287-290
9. Tham J. Y., Ranganath S., Ranganath M., Kassim A. A. 1998 “A novel unrestricted center-based diamond search algorithm for block motion estimation,” [IEEE Transactions on Circuits and Systems for Video Technology] Vol.8 P.369-377
10. Hosur P. I., Ma K. K. 1999 “Motion vector field adaptive fast motion estimation,” [in Proceedings of the 2nd International Conference on Information, Communications and Signal Processing] P.7-10
11. Tourapis A. M., Au O. C., Liou M. L. 2001 “Predictive motion vector field adaptive search technique (PMVFAST): enhancing block-based motion estimation,” [in Proceedings of the Visual Communication and Image Processing] P.883-892
12. 2001 “Optimized visual reference software,” Text of 14496-7 PDTR, ISO/IEC N4057
13. 1999 “Experimental conditions for evaluating encoder motion estimation algorithms,” ISO/IEC JTC1/SC29/WG11 N3141
14. 2001 “The MPEG-4 video verification model v.18.0,” ISO/IEC JTC1/SC29/WG11 N3908

이미지 / 테이블

[ Fig. 1. ] Diamond search patterns: (a) large diamond search pattern (LDSP) and (b) small diamond search pattern (SDSP).
[ Fig. 2. ] Motion vectors of spatially adjacent blocks. MV: motion vector.
[ Fig. 3. ] An example of the search procedure for the best matching block in predictive motion vector field adaptive search technique. MV: motion vector.
[ Fig. 4. ] Temporally co-located motion vector, where ft and ft-1 represent the current frame and the reference frame, respectively. MV: motion vector.
[ Table 1. ] Average distances in pixels between each candidate and the final MV in the case of PMVFAST
[ Table 2. ] The number of final MVs after the examination of only one point
[ Table 3. ] Frequency and ratio that MV0 is selected when all of the spatially and temporally adjacent MVs are not equal to MV0
[ Table 4. ] Results of the proposed algorithm in terms of average PSNR and speed-up ratio
[ Fig. 5. ] Distribution of motion vector fields of the Foreman sequence for each search method: (a) full search and (b) proposed algorithm.
[ Table 5. ] Experimental results of the proposed algorithm compared with the conventional methods