Fast Motion Estimation Based on a Modified Median Operation for Efficient Video Compression
 Author: Kim Jongho
 Organization: Kim Jongho
 Publish: Journal of information and communication convergence engineering Volume 12, Issue1, p53~59, 31 March 2014

ABSTRACT
Motion estimation is a core part of most video compression systems since it directly affects the output video quality and the encoding time. The full search (FS) technique gives the highest visual quality but has the problem of a significant computational load. To solve this problem, we present in this paper a modified median (MMED) operation and advanced search strategies for fast motion estimation. The proposed MMED operation includes a temporally colocated motion vector (MV) to select an appropriate initial candidate. Moreover, we introduce a search procedure that reduces the number of thresholds and simplifies the early termination conditions for the determination of a final MV. The experimental results show that the proposed approach achieves substantial speedup compared with the conventional methods including the motion vector field adaptive search technique (MVFAST) and predictive MVFAST (PMVFAST). The proposed algorithm also improves the PSNR values by increasing the correlation between the MVs, compared with the FS method.

KEYWORD
Diamond Search , Fast motion estimation , Median predictor , MVFAST , PMVFAST

I. INTRODUCTION
The block matching algorithm (BMA) is an essential part of many video coding standards, such as MPEG1, 2, 4, and H.26x, to exploit the temporal correlation and reduce the redundancy that exists between frames of video sequences [1, 2]. In BMA, the current frame is partitioned into nonoverlapped blocks, and the best match for these blocks is found inside the search range of a reference frame with a predefined distortion criterion. The best matching block is then used as a predictor for the block in the current frame, where the displacement between the two blocks defines a motion vector (MV) of the current block. Thus, the encoder sends the MV and a residual block, which is the difference between the current block and the predictor, instead of the current block itself. This can significantly reduce the number of bits involved in video coding procedures. The most common distortion measure is the mean absolute error or mean absolute difference, or equivalently, the sum of the absolute difference (SAD), which requires no multiplication and exhibits a performance similar to that of the mean squared error [3, 4]. Let
f_{t} andf_{t1} be a current frame and a reference frame, respectively. Then, for an M × N block located at (x, y ) in the current frame, the SAD is given bywhere
d_{x} andd_{y} denote the displacements in thex  andy direction, respectively.The full search (FS) technique selects the MV, which points to the best matching block that has a minimum SAD, by exhaustive search at all possible points within the search range. Since the calculation of SAD at all possible locations is computationally intensive, it is difficult to directly use FS for various applications, such as digital TV, Internet streaming, and mobile video, in spite of its simple implementation. For this reason, many fast search algorithms have been suggested in an effort to reduce the computational complexity by searching only a subset of the possible search points, such as twodimensional (2D) log search [5], threestep search (TSS) [6], new threestep search (NTSS) [7], and diamond search (DS) strategies [8, 9]. However, these methods usually incur a significant loss in the PSNR, although they can make the search faster by calculating the SAD less frequently. Moreover, these algorithms are not exactly optimal in consideration of the bits required to encode each MV, since they find MVs just by minimizing the SAD irrespective of the distribution of MVs.
In particular, the motion vector field adaptive search technique (MVFAST) and the predictive MVFAST (PMVFAST) have been introduced in order to improve the DS technique without sacrificing PSNR [1012]. The MVFAST considers the MVs of the spatially adjacent blocks and the PMVFAST includes the MVs of the spatially and temporally adjacent blocks as possible candidates. The priority of the initial search point in PMVFAST is given to results of the median operation for the candidates, whereas priority in DS is given to a (0, 0) motion vector, MV_{0}. In addition, the early termination by a thresholding technique in the PMVFAST further reduces the computational complexity of the search procedures. These techniques of PMVFAST allow the video encoders to generate fewer bits for MVs with a negligible PSNR loss as well as significantly reduce the computations for search procedures, since the relatively high correlation between the MVs can be used effectively.
In this paper, we propose a modified median (MMED) operation and effective early termination conditions for the fast motion estimation. Since the MV of a temporally colocated block is more correlated with the final MV in many cases, our approach uses the MVs of four adjacent blocks, which include the left, top, and topright blocks of the current block and a temporally colocated block in a reference frame, whereas PMVFAST uses the MVs of three spatially adjacent blocks. To select an appropriate candidate in the initial search procedure, we introduce an MMED operator that can decrease the number of search points, and this reduces the overall computational load. Since the initial search point is chosen more appropriately, thresholds and early termination conditions can be simplified, and consequently, we achieve faster motion estimation without sacrificing PSNR compared with the case of PMVFAST. The rest of this paper is organized as follows: Section II presents a review of MVFAST and PMVFAST. In Section III, the proposed fast motion estimation technique is discussed in detail. Experimental results and conclusions are given in Sections IV and V, respectively.
II. REVIEW OF MVFAST AND PMVFAST
Most of the fast motion estimation algorithms first choose several candidates and then, calculate SAD at the candidate points. After the selection of the candidate with the minimum SAD, more search points around the selected candidate are tested. If the best candidate is not found in this procedure, the current candidate becomes the final MV. Therefore, the efficiency of the fast search algorithms depends on the determination of the candidates. In TSS, the number of search points does not change significantly according to the number of moving objects in the video sequences. On the other hand, in the DS that is biased to MV_{0}, better performance can be observed when the video sequences contain a few moving objects. MVFAST and PMVFAST also use DS patterns for fast motion estimation that is adaptive to the characteristics of video sequences. We will now briefly describe MVFAST and PMVFAST, which improve the search speed significantly with a negligible PSNR loss, since the proposed method is based on them.
> A. Search Procedures of MVFAST
MVFAST introduced in [10, 12] improves the speedup of the DS algorithm significantly by initially considering a small set of candidates, which includes MV0 and the MVs of three spatially adjacent blocks (the left, top, and topright blocks). Unlike the DS that uses only the large diamond search pattern (LDSP) shown in Fig. 1(a), MVFAST considers the small diamond search pattern (SDSP) shown in Fig. 1(b).
Both the diamond patterns (LDSP and SDSP) keep on moving until the minimum SAD is found at the center. If the center point is selected using SDSP, the search procedure is terminated at this point. If the center point is selected using LDSP, the search procedure is carried out again using SDSP around the center of LDSP.
In MVFAST, the selection between SDSP and LDSP depends on the motion activity that the current block will most likely have. This motion activity is characterized by the magnitude of the largest MV among the three adjacent MVs, i.e., MV_{left}, MV_{top}, and MV_{topright}, as shown in Fig. 2.
The motion activity is described as
low when it is below a thresholdL _{1}, medium when it is betweenL _{1} and a second thresholdL _{2}, and high when it is larger thanL _{2}. If the motion activity is consideredlow , SDSP is applied with MV_{0} as the center. If the motion activity ismedium , MV_{0} is selected again as the center and LDSP is applied at this time instead of SDSP. This is the only case in which LDSP is selected, whereas the DS algorithm always chooses LDSP. After testing with LDSP, we apply SDSP again at the point with the smallest SAD as the center. Finally, if the motion activity is characterized ashigh , the center is determined with the position of the MV that yields the minimum SAD among MV_{0} and the spatially adjacent MVs, and then, SDSP is applied at this point. Moreover, in order to increase the search speed, MVFAST can introduce an early termination step, which is performed initially in the algorithm, with a fixed threshold (e.g.,T = 512) at the MV_{0} position. When the SAD at the MV_{0} position is lower thanT, MVFAST is stopped without examining any other locations.> B. Search Procedures of PMVFAST
Unlike MVFAST whose search priority is given to MV_{0,} the priority of PMVFAST is given to the median of three spatially adjacent MVs [11, 12]. If SAD at the point obtained by the median operation is sufficiently small (e.g.,
T = 256), the search is terminated early after examining just one point. Otherwise, another termination condition is considered as follows: If the MV selected by the median operation is equal to the MV of a temporally colocated block (MV_{t1}) and SAD at this point is smaller than SAD_{t1}, which is SAD between the block inf _{t1} and the block pointed by MV_{t1} inf _{t2}, the search is stopped. When the algorithm is not terminated after checking the median operation, the final MV is searched by examining the five candidates, MV_{0}, MV_{left}, MV_{top}, MV_{topright}, and MV_{t1}. After the selection of the best candidate from among them, a thresholdT _{1} is applied; that is, the search is stopped if SAD calculated by the best candidate is smaller thanT _{1}, which is selected adaptively as the smallest SAD among the SADs of the left, top, and topright blocks. Otherwise, the best candidate is used as the center of the DS pattern. Furthermore, if the best candidate and MV_{t1} are identical and SAD at the best candidate is smaller than SAD_{t1}, the search is terminated.In PMVFAST, LDSP is selected less frequently to reduce the total number of search points. To determine which DS pattern should be used, another threshold
T _{2}, defined asT _{1} + 256 is applied. LDSP is chosen whenT _{2} is larger than 1536 and the result of the median operation is MV_{0}, whereas SDSP is used in all other cases. When the smallest SAD is obtained at the center of the pattern, the algorithm is terminated. Additionally, PMVFAST has another early termination condition as follows: If all of the MVs of three spatially adjacent blocks are identical and MV_{t1} is equal to the result of the median operation, the candidates (three spatially adjacent MVs and MV_{0}) are highly correlated with each other. In this case, the selected search pattern is applied just once. Fig. 3 shows an example of searching the best matching block where the square shapes represent the points that have the smallest SAD in each step.III. PROPOSED MOTION ESTIMATION
We introduce a MMED operation to predict the appropriate starting search point, which significantly affects the performance of a fast motion estimation technique. Furthermore, the proposed algorithm reduces the number of thresholds and early termination conditions, and it consequently prevents the degradation in PSNR.
> A. Modified Median Operation
PMVFAST feeds three spatially adjacent MVs to the median operator in order to determine the initial search point. However, our investigation presents that a temporally colocated MV, MV_{t1} shown in Fig. 4, can be a good candidate for the starting search point, since it is highly correlated with the final MV
Table 1 describes the average distances in pixel values between each candidate MV and the final MV as obtained by using PMVFAST. We can infer from the table that MV_{t1} is more highly correlated to the final MV than the others in most of the test sequences. Therefore, we include MV_{t1} in the median operation to determine the starting search point. Let (
p_{x}, p_{y} ) be a starting search point; then, the MMED operation is given bywhere
i ∈ {left, top, topright,t 1}, i.e., denotes thex component of the starting search point selected among the MVs of the left, top, topright, and temporally colocated blocks.The proposed MMED operation can deal with evennumbered elements, whereas the conventional median operation used in PMVFAST can process only oddnumbered elements. A median operation eliminates the uncorrelated values more effectively than an average operation. On the other hand, the proposed MMED operation reduces the prediction error by averaging the spatially and temporally adjacent MVs and excluding the maximum and minimum values among them, which are considered to be less correlated to the final MV.
The starting search point for the current block at the typical position is determined as follows: When the current block is located at the origin (i.e., topleftmost position) of the frame, MV_{t1} is chosen directly as the predicted MV, since it is the only available candidate MV. When the current block lies in the first row except at the origin of the frame, two candidates, MV_{left} and MV_{t1}, are available. Since the proposed MMED operator cannot deal with two elements, MV_{0} is added in this case. When the current block lies in the first column, except at the origin of the frame, or in the rightmost column, three candidates are available (MV_{top}, MV_{topright}, and MV_{t1} for the first column, or MV_{left}, MV_{top}, and MV_{t1} for the rightmost column) and the starting search point is selected by the median rule instead of (2).
After the starting search point is determined, the early termination condition is applied. That is, if SAD at the point from (2) is less than
T (e.g., 256), the search procedure is terminated. Furthermore, when the point from (2) is equal to MV_{t1} and SAD at this point is smaller than SAD_{t1}, the search procedure is also stopped. Table 2, which describes the frequency of the final MV determined after only one point is examined, presents that the MV from the proposed MMED operation is selected as the final MV more frequently than in the case of PMVFAST. This implies that the proposed algorithm can select the best matching block faster with a lower PSNR loss than PMVFAST.> B. Determination of the Final Motion Vector
PMVFAST examines five candidates including MV_{0} after the selection of the starting search point by the median operation. In fact, as shown in Table 3, MV_{0} is rarely selected as the final MV when all the other candidates are not equal to MV_{0}. This implies that there is at least one MV_{0} among the spatially and temporally adjacent MVs when MV_{0} is selected as the final MV in most cases. Thus, MV_{0} does not need to be explicitly included in the search procedure, since it is possibly included in the spatially and temporally adjacent MVs.
Consequently, the proposed algorithm includes just four candidates of MV_{left}, MV_{top}, MV_{topright}, and MV_{t1 } in the search procedure for the final MV. Since MV_{0} is not tested, the total number of search points is reduced and the correlation between MVs can increase. After an examination of these four candidates, the point that has the smallest SAD is selected as the center of SDSP. This further reduces the number of search points, since the proposed algorithm does not use LDSP. When the MVs are more correlated with each other in a video frame, the number of bits needed to encode each MV is reduced, since the video encoders usually encode the difference between the current MV and the adjacent MVs. We apply an early termination condition to the selected MV in order to further increase the correlation between the MVs as follows: A threshold
T _{1}, which is limited between 512 and 1024, is set to the smallest SAD among the three spatially adjacent MVs (MV_{left}, MV_{top}, and MV_{topright}). If SAD at the selected MV is smaller thanT _{1}, the search procedure is stopped. When the selected MV is equal to MV_{t1} and SAD at this point is smaller than SAD_{t1}, the search is also terminated. For the other cases, the proposed search procedures apply SDSP to find the smallest SAD. In this case, the thresholds and early termination conditions are no longer used.IV. EXPERIMENTAL RESULTS
The performance of the proposed method was evaluated on the basis of [13] by using the MPEG4 verification model encoder [14]. Test sequences of QCIF (176 × 144), CIF (352 × 288), and SIF (352 × 240) were coded as IPPP; that is, all frames were intercoded and only the first frame was intracoded. TM5 was adopted as the rate control algorithm. Thresholds for motion activity in MVFAST,
L _{1} andL _{2} were set to 1 and 2, respectively, and the threshold for early termination in PMVFAST and the proposed method,T was set to 256. The maximum displacements of the search range were set to ±16, ±32, and ±48 pixels, respectively, in both the horizontal and the vertical direction.Table 4 summarizes the performance of the proposed algorithm compared with MVFAST and PMVFAST in terms of both the PSNR and the speedup ratio on average. The speedup ratio is obtained by comparing the number of search points of each method with that of FS. Detailed results of the proposed approach and conventional methods for each experimental condition are described in Table 5.
It can be inferred from Tables 4 and 5 that the proposed algorithm significantly reduces the number of search points compared with those of conventional methods and exhibits similar or higher PSNRs than those of the FS. When the correlation between adjacent MVs increases, the video encoder can reduce the number of bits needed to encode each MV, and thus, the extra bits can be assigned for increasing PSNR. Fig. 5 shows the distribution of MV fields for the
Foreman sequence from the FS and the proposed method. As shown in Fig. 5, the final MVs of the proposed algorithm are distributed more uniformly than those of the FS.The proposed scheme is superior to PMVFAST, which is a widely used technique for fast motion estimation, in terms of both the speedup ratio and the PSNR, as shown in Table 5. The performance of PMVFAST can be limited, since many thresholds without considering the characteristics of MVs possibly search for inappropriate points. The proposed algorithm searches faster than conventional algorithms and its average PSNRs are higher than those of the FS.
V. CONCLUSIONS
This paper presents a fast motion estimation method based on a MMED operation and advanced strategies for the selection of the search points. The proposed MMED operation includes the temporally colocated MV for an initial candidate, which is more correlated with the final MV, as observed in our experiments. Moreover, we reduce the number of thresholds and simplify the early termination conditions. With these improvements, the proposed approach achieves a faster search than MVFAST and PMVFAST. The proposed algorithm also improves PSNRs compared with those of the FS, since the MVs obtained from our approach are more correlated with each other and the bitsavings in coding them can be used for improvements in PSNR. Based on the better performance in terms of fast search and PSNR, the proposed algorithm can be employed in many applications, such as digital TV, Internet streaming, and mobile video by simply modifying the existing algorithms.

[Fig. 1.] Diamond search patterns: (a) large diamond search pattern (LDSP) and (b) small diamond search pattern (SDSP).

[Fig. 2.] Motion vectors of spatially adjacent blocks. MV: motion vector.

[Fig. 3.] An example of the search procedure for the best matching block in predictive motion vector field adaptive search technique. MV: motion vector.

[Fig. 4.] Temporally colocated motion vector, where ft and ft1 represent the current frame and the reference frame, respectively. MV: motion vector.

[Table 1.] Average distances in pixels between each candidate and the final MV in the case of PMVFAST

[Table 2.] The number of final MVs after the examination of only one point

[Table 3.] Frequency and ratio that MV0 is selected when all of the spatially and temporally adjacent MVs are not equal to MV0

[Table 4.] Results of the proposed algorithm in terms of average PSNR and speedup ratio

[Fig. 5.] Distribution of motion vector fields of the Foreman sequence for each search method: (a) full search and (b) proposed algorithm.

[Table 5.] Experimental results of the proposed algorithm compared with the conventional methods