The block matching algorithm (BMA) is an essential part of many video coding standards, such as MPEG-1, 2, 4, and H.26x, to exploit the temporal correlation and reduce the redundancy that exists between frames of video sequences [1, 2]. In BMA, the current frame is partitioned into nonoverlapped blocks, and the best match for these blocks is found inside the search range of a reference frame with a predefined distortion criterion. The best matching block is then used as a predictor for the block in the current frame, where the displacement between the two blocks defines a motion vector (MV) of the current block. Thus, the encoder sends the MV and a residual block, which is the difference between the current block and the predictor, instead of the current block itself. This can significantly reduce the number of bits involved in video coding procedures. The most common distortion measure is the mean absolute error or mean absolute difference, or equivalently, the sum of the absolute difference (SAD), which requires no multiplication and exhibits a performance similar to that of the mean squared error [3, 4]. Let
where
The full search (FS) technique selects the MV, which points to the best matching block that has a minimum SAD, by exhaustive search at all possible points within the search range. Since the calculation of SAD at all possible locations is computationally intensive, it is difficult to directly use FS for various applications, such as digital TV, Internet streaming, and mobile video, in spite of its simple implementation. For this reason, many fast search algorithms have been suggested in an effort to reduce the computational complexity by searching only a subset of the possible search points, such as two-dimensional (2D) log search [5], three-step search (TSS) [6], new three-step search (NTSS) [7], and diamond search (DS) strategies [8, 9]. However, these methods usually incur a significant loss in the PSNR, although they can make the search faster by calculating the SAD less frequently. Moreover, these algorithms are not exactly optimal in consideration of the bits required to encode each MV, since they find MVs just by minimizing the SAD irrespective of the distribution of MVs.
In particular, the motion vector field adaptive search technique (MVFAST) and the predictive MVFAST (PMVFAST) have been introduced in order to improve the DS technique without sacrificing PSNR [10-12]. The MVFAST considers the MVs of the spatially adjacent blocks and the PMVFAST includes the MVs of the spatially and temporally adjacent blocks as possible candidates. The priority of the initial search point in PMVFAST is given to results of the median operation for the candidates, whereas priority in DS is given to a (0, 0) motion vector, MV_{0}. In addition, the early termination by a thresholding technique in the PMVFAST further reduces the computational complexity of the search procedures. These techniques of PMVFAST allow the video encoders to generate fewer bits for MVs with a negligible PSNR loss as well as significantly reduce the computations for search procedures, since the relatively high correlation between the MVs can be used effectively.
In this paper, we propose a modified median (MMED) operation and effective early termination conditions for the fast motion estimation. Since the MV of a temporally colocated block is more correlated with the final MV in many cases, our approach uses the MVs of four adjacent blocks, which include the left, top, and top-right blocks of the current block and a temporally co-located block in a reference frame, whereas PMVFAST uses the MVs of three spatially adjacent blocks. To select an appropriate candidate in the initial search procedure, we introduce an MMED operator that can decrease the number of search points, and this reduces the overall computational load. Since the initial search point is chosen more appropriately, thresholds and early termination conditions can be simplified, and consequently, we achieve faster motion estimation without sacrificing PSNR compared with the case of PMVFAST. The rest of this paper is organized as follows: Section II presents a review of MVFAST and PMVFAST. In Section III, the proposed fast motion estimation technique is discussed in detail. Experimental results and conclusions are given in Sections IV and V, respectively.
Most of the fast motion estimation algorithms first choose several candidates and then, calculate SAD at the candidate points. After the selection of the candidate with the minimum SAD, more search points around the selected candidate are tested. If the best candidate is not found in this procedure, the current candidate becomes the final MV. Therefore, the efficiency of the fast search algorithms depends on the determination of the candidates. In TSS, the number of search points does not change significantly according to the number of moving objects in the video sequences. On the other hand, in the DS that is biased to MV_{0}, better performance can be observed when the video sequences contain a few moving objects. MVFAST and PMVFAST also use DS patterns for fast motion estimation that is adaptive to the characteristics of video sequences. We will now briefly describe MVFAST and PMVFAST, which improve the search speed significantly with a negligible PSNR loss, since the proposed method is based on them.
MVFAST introduced in [10, 12] improves the speed-up of the DS algorithm significantly by initially considering a small set of candidates, which includes MV0 and the MVs of three spatially adjacent blocks (the left, top, and top-right blocks). Unlike the DS that uses only the large diamond search pattern (LDSP) shown in Fig. 1(a), MVFAST considers the small diamond search pattern (SDSP) shown in Fig. 1(b).
Both the diamond patterns (LDSP and SDSP) keep on moving until the minimum SAD is found at the center. If the center point is selected using SDSP, the search procedure is terminated at this point. If the center point is selected using LDSP, the search procedure is carried out again using SDSP around the center of LDSP.
In MVFAST, the selection between SDSP and LDSP depends on the motion activity that the current block will most likely have. This motion activity is characterized by the magnitude of the largest MV among the three adjacent MVs, i.e., MV_{left}, MV_{top}, and MV_{top-right}, as shown in Fig. 2.
The motion activity is described as
Unlike MVFAST whose search priority is given to MV_{0,} the priority of PMVFAST is given to the median of three spatially adjacent MVs [11, 12]. If SAD at the point obtained by the median operation is sufficiently small (e.g.,
In PMVFAST, LDSP is selected less frequently to reduce the total number of search points. To determine which DS pattern should be used, another threshold
We introduce a MMED operation to predict the appropriate starting search point, which significantly affects the performance of a fast motion estimation technique. Furthermore, the proposed algorithm reduces the number of thresholds and early termination conditions, and it consequently prevents the degradation in PSNR.
PMVFAST feeds three spatially adjacent MVs to the median operator in order to determine the initial search point. However, our investigation presents that a temporally co-located MV, MV_{t-1} shown in Fig. 4, can be a good candidate for the starting search point, since it is highly correlated with the final MV
Table 1 describes the average distances in pixel values between each candidate MV and the final MV as obtained by using PMVFAST. We can infer from the table that MV_{t-1} is more highly correlated to the final MV than the others in most of the test sequences. Therefore, we include MV_{t-1} in the median operation to determine the starting search point. Let (
Average distances in pixels between each candidate and the final MV in the case of PMVFAST
where
The proposed MMED operation can deal with evennumbered elements, whereas the conventional median operation used in PMVFAST can process only odd-numbered elements. A median operation eliminates the uncorrelated values more effectively than an average operation. On the other hand, the proposed MMED operation reduces the prediction error by averaging the spatially and temporally adjacent MVs and excluding the maximum and minimum values among them, which are considered to be less correlated to the final MV.
The starting search point for the current block at the typical position is determined as follows: When the current block is located at the origin (i.e., top-leftmost position) of the frame, MV_{t-1} is chosen directly as the predicted MV, since it is the only available candidate MV. When the current block lies in the first row except at the origin of the frame, two candidates, MV_{left} and MV_{t-1}, are available. Since the proposed MMED operator cannot deal with two elements, MV_{0} is added in this case. When the current block lies in the first column, except at the origin of the frame, or in the rightmost column, three candidates are available (MV_{top}, MV_{top-right}, and MV_{t-1} for the first column, or MV_{left}, MV_{top}, and MV_{t-1} for the rightmost column) and the starting search point is selected by the median rule instead of (2).
After the starting search point is determined, the early termination condition is applied. That is, if SAD at the point from (2) is less than
[Table 2.] The number of final MVs after the examination of only one point
The number of final MVs after the examination of only one point
PMVFAST examines five candidates including MV_{0} after the selection of the starting search point by the median operation. In fact, as shown in Table 3, MV_{0} is rarely selected as the final MV when all the other candidates are not equal to MV_{0}. This implies that there is at least one MV_{0} among the spatially and temporally adjacent MVs when MV_{0} is selected as the final MV in most cases. Thus, MV_{0} does not need to be explicitly included in the search procedure, since it is possibly included in the spatially and temporally adjacent MVs.
Frequency and ratio that MV0 is selected when all of the spatially and temporally adjacent MVs are not equal to MV0
Consequently, the proposed algorithm includes just four candidates of MV_{left}, MV_{top}, MV_{top-right}, and MV_{t-1 } in the search procedure for the final MV. Since MV_{0} is not tested, the total number of search points is reduced and the correlation between MVs can increase. After an examination of these four candidates, the point that has the smallest SAD is selected as the center of SDSP. This further reduces the number of search points, since the proposed algorithm does not use LDSP. When the MVs are more correlated with each other in a video frame, the number of bits needed to encode each MV is reduced, since the video encoders usually encode the difference between the current MV and the adjacent MVs. We apply an early termination condition to the selected MV in order to further increase the correlation between the MVs as follows: A threshold
The performance of the proposed method was evaluated on the basis of [13] by using the MPEG-4 verification model encoder [14]. Test sequences of QCIF (176 × 144), CIF (352 × 288), and SIF (352 × 240) were coded as IPPP; that is, all frames were inter-coded and only the first frame was intra-coded. TM5 was adopted as the rate control algorithm. Thresholds for motion activity in MVFAST,
Table 4 summarizes the performance of the proposed algorithm compared with MVFAST and PMVFAST in terms of both the PSNR and the speed-up ratio on average. The speed-up ratio is obtained by comparing the number of search points of each method with that of FS. Detailed results of the proposed approach and conventional methods for each experimental condition are described in Table 5.
[Table 4.] Results of the proposed algorithm in terms of average PSNR and speed-up ratio
Results of the proposed algorithm in terms of average PSNR and speed-up ratio
It can be inferred from Tables 4 and 5 that the proposed algorithm significantly reduces the number of search points compared with those of conventional methods and exhibits similar or higher PSNRs than those of the FS. When the correlation between adjacent MVs increases, the video encoder can reduce the number of bits needed to encode each MV, and thus, the extra bits can be assigned for increasing PSNR. Fig. 5 shows the distribution of MV fields for the
The proposed scheme is superior to PMVFAST, which is a widely used technique for fast motion estimation, in terms of both the speed-up ratio and the PSNR, as shown in Table 5. The performance of PMVFAST can be limited, since many thresholds without considering the characteristics of MVs possibly search for inappropriate points. The proposed algorithm searches faster than conventional algorithms and its average PSNRs are higher than those of the FS.
[Table 5.] Experimental results of the proposed algorithm compared with the conventional methods
Experimental results of the proposed algorithm compared with the conventional methods
This paper presents a fast motion estimation method based on a MMED operation and advanced strategies for the selection of the search points. The proposed MMED operation includes the temporally co-located MV for an initial candidate, which is more correlated with the final MV, as observed in our experiments. Moreover, we reduce the number of thresholds and simplify the early termination conditions. With these improvements, the proposed approach achieves a faster search than MVFAST and PMVFAST. The proposed algorithm also improves PSNRs compared with those of the FS, since the MVs obtained from our approach are more correlated with each other and the bit-savings in coding them can be used for improvements in PSNR. Based on the better performance in terms of fast search and PSNR, the proposed algorithm can be employed in many applications, such as digital TV, Internet streaming, and mobile video by simply modifying the existing algorithms.