A Modified Expansion-Contraction Method for Mobile Object Tracking in Video Surveillance: Indoor Environment

Kang Jin-Shig

doi:10.5391/IJFIS.2013.13.4.298

OA학술지
International Journal of Fuzzy Logic and Intelligent Systems

A Modified Expansion-Contraction Method for Mobile Object Tracking in Video Surveillance: Indoor Environment

DOI : 10.5391/IJFIS.2013.13.4.298
Author: Kang Jin-Shig
Organization: Kang Jin-Shig
Publish: International Journal of Fuzzy Logic and Intelligent Systems Volume 13, Issue4, p298~306, 25 Dec 2013

ABSTRACT

A Modified Expansion-Contraction Method for Mobile Object Tracking in Video Surveillance: Indoor Environment

KEYWORD

Object tracking , Mobile object , Object window , Background image , Modified expansion-contraction algorithm

본문

Collapse all

1. Introduction

In recent years, an increasing number of studies have investigated video surveillance and mobile object tracking algorithms. The application areas of object tracking include

Motion-based recognition of humans, Automated surveillance for monitoring a scene to detect suspicious activities or unlikely events, and Traffic monitoring for real-time collection of traffic statistics in order to direct traffic flow.

To detect a mobile object, the target object should be separated from the background. This can be done by using the background subtraction method or the frame differencing method for adjacent frames. The method used for object tracking depends on the representation of the target object as a point, silhouette, etc. [1]. Typical object tracking methods are point-based tracking, kernel tracking and silhouette tracking [1, 2]. Recent years have witnessed the growing use of probabilistic approaches, such as the use of a probability distribution to represent the position and color distribution of an object, for object tracking [3].

Several multiple-object tracking algorithms such as Kalman filter [4], particle filter [5-8], and mean shift [9, 10] are also available. Furthermore, a vector Kalman predictor [11] has been proposed for tracking objects. In this paper, separate methods for occlusion and merging are applied to resolve ambiguous situations. Moreover, states of the corresponding moving objects are searched using a spiral searching technique prior to tracking. Recently, Czyzewski and Dalka [12] used a Kalman filter with an RGB color-based approach to measure the similarity between moving objects. Zhang et al. [13] presented a particle swarm optimization-based approach for multiple-object tracking based on histogram matching. Jiang et al. [14] suggested a linear programming approach, whereas Huang and Essa [15] presented an algorithm for tracking multiple objects through occlusions.

The basic expansion-contraction (E-C) algorithm has been presented in previous papers [16, 17]. The problems discussed in these papers include

Changes in lighting conditions, Failure to track fast-moving objects, and Difficulty in separating adjacent objects.

In this paper, a modified E-C algorithm for multiple-object tracking is presented. Modifications are made to the method of expansion and contraction for an object window in order to separate the target object from the surrounding objects and the background. The proposed algorithm includes a method for avoiding occlusion of the target image. Finally, the validity of the proposed algorithm is verified through several experiments.

2. Problem Formulation and Definitions

   2.1 Summary of Some Definitions Proposed by in [18, 19]

Several parameters such as object window, object area, and expansion and contraction parameter defined in are reintroduced in this paper. The binary image is denoted by I, and I_x and I_y are defined as

where I_x and I_y represent the density of non-zero pixels in the x-direction and y-direction, respectively. The object window is defined as a minimized image box that includes a target object. The object area can be computed as

The center position x_p, y_p can be calculated as

where x_p is the center of mass in the x-direction and y_p is the center of mass in the y-direction.

In case of object tracking with a video stream, the size of the target object changes according to its distance from the camera. Thus, the size of the object window must be changed depending on the size of the target object. To carry out this operation, the expansion and contraction parameter is defined as

which is the ratio of the object window to the target object. Note that the object window must include the target object, and EC_par must be greater than 1.

   2.2 Separable, Partially Separable, and Inseparable Objects

It is important to separate the target object from other objects, in order to ensure that the resulting object window contains only the target object. Figure1 shows a group of people walking together (left), and its corresponding I_x (top-right) and I_y (bottom-right). As shown in the left figure, it is difficult to separate the encircled person entirely as a vertical strip or horizontal strip. However, as shown in the top-right figure, the encircled person may be separated as a vertical strip i.e., partially separable on the x-coordinate. However, a woman indicated by the white arrow cannot be separated on any coordinate because its object area is relatively small and is thus absorbed in a different object’s area in the course of the operation of I_x and I_y.

[Figure 1.] (a). Group of peoplewalking together (left), corresponding Ix (top-right) and Iy (bottom-right). (b). The stripped image on y-coordinate (top), corresponding Ix (center) and Iy (bottom).

Even in this case, the target object lies between 100-220 pixels on the y- axis and the finally separated object window is shown in Figure1(b). Further, the corresponding I_x and I_y are shown in the center figure and the bottom figure respectively. As shown in the center figure of Figure 1(b), the target object window is separated well and it contains the target object.

Let us now consider another example where the aim is to separate the encircled image as shown in the top-left figure of Figure 2. As shown in the middle and bottom figures, the target object (people) is partially separable on the x coordinate but is inseparable on the y coordinate. Thus, from the information obtained the middle figure, i.e., the target object lies in 120~160 pixels across, the image can be separated into the strip image, which contains 120~160 pixels across in the x-direction and all pixels in the y-direction. The resulting strip is shown by the strip box in the top-right figure of Figure 3(a). The next step is to recalculate I_x and I_y for the strip image obtained previously, which is shown in the top-right and second right figures in Figure 3(b). The top-left figure in Figure 3(b) shows the strip image, the top-right figure is I_x, and the second-right figure is I_y. The strip image i.e., the top-left figure shows that there is some noise at the top of the strip image, which cannot be separated on the x coordinate anymore. However, as shown in the strip image or second-right figure, the target object can be separated from the noise on the y axis.

[Figure 2.] (a) The original image frame (top-left), binary image (top-right), Ix (middle) and Iy (bottom). (b) The strip image (top-left), the final objectwindow(bottom-left), Ix (top-right) and Iy (second-right) for strip image and Ix (third-right) and Iy (bottom-right) for the final object window.

[Figure 3.] Overview of the system flow.

3. Modified E-C Method

The entire process of object tracking is described in this section. This section describes the overall system flow and suggests an algorithm for updating the background image. A method for expansion and contraction of the object window and the process of selecting an object by color information are also described in this section.

   3.1 Object Tracking Procedure

The overall process of object tracking is shown in Figure 3. The first step in object tracking is the initialization process. This step involves

Computation of the initial position of the target object, Selection of an extended initial object window, Selection of Δp0 (Δx0, Δy0), which is the initial value of the variation of the center of mass point of the target object, and Computation of the predicted center of mass position for the next frame.

Go to the first frame. Extracting the sub-image from the background frame and the current frame is the second step in this process. In this step, the predicted center of mass position is considered as the center and the size of the window is three or four times greater than that of the object window that was previously selected. In the next step, the absolute difference between the two sub-images obtained earlier is calculated and converted into a binary image using a thresh-old operation. The fourth step involves calculating diag(I I^T) and diag(I^TI), contracting the extended object window, and extracting the target object. The area of the target object, the actual center of mass positionp₁ (x₁, y₁), and the expansion and contraction parameter EC_par are calculated in this step. In the final step, the predicted center of mass position is computed. Go to the next frame.

The target tracking process described above can be summarized as three key-stages, prediction - operation - update. In prediction stage, the predicted center of mass position of the target objects are computed by using informations obtained previous frames, and expanded object window, centered at the predicted center of mass and sized three or four times larger than target object, is selected for each target. The primary role of operation stage is extraction of the target objects. This stage includes extraction of sub-image, conversion of sub-image into binary image, calculation of I_x and I_y, and contraction of object window. If it is required to separate target object from other objects, then the separation process described in Section 2.2 is performed. In update stage, the actual center of mass position for each target and EC_par are computed.

   3.2 Expansion and Contraction of ObjectWindow

The center of mass position p_k (x_k, y_k) for the k^th frame is described by

where, η_x , η_y are noise terms.

For the (k+1)^th frame, the predicted center of mass position is

Eqs. (8a) and (8b) are described, in terms of measured values, as follows:

For the case of multiple-target tracking, the predicted position of the j^th object is

The calculations used in this paper to predict the center of mass point of a target object are very simple and adequate for target tracking in an indoor environment. Of course, the Kalman filtering method or the particle filter algorithm is also available instead of Eq. (5).

The expansion and contraction procedure, a part of the main result of this paper, is shown in Figures 4 and 5. For comparison with other studies, all video materials are borrowed from context aware vision using image-based active recognition (CAVIAR) [17]. Figure 4 shows how to extend and contract the object window. In this figure, the first image is the background image and the remaining three images show a woman walking at 60-frame intervals. The top-right image in this figure shows the expansion and contraction procedure of an object window. The first step is calculating p₀ (x₀, y₀) by reducing the initially selected objected window and using Eq. (3) described in the top-left and top-right figures in Figure 5. Then, the predicted center of mass described in the top-right figure of Figure 4 is computed. In the current (k^th) frame, obtain sub-images by extracting the background and the k^th frame and calculate the binary image shown in the mid-left figure of Figure 5. Then, obtain the object window by contraction (white arrow). Then, compute the predicted center of mass .

[Figure 4.] Background image (top-left), base frame (top-right), kth frame (bottom-left), and (k+1)th frame (bottom-right) of the expansion and contraction procedure of a person walking at a 60-frames interval are shown above.

[Figure 5.] Expansion and contraction of object window (top-right), same operation on Ix (middle), and same operation on Iy (bottom).

The operation of expansion and contraction of the object window is very simple as the actual operation is performed on the I_x and I_y axis and not on the image frame expansion. These operations are shown in the middle and bottom figures, respectively. The operation procedure is a two- step process that involves extending and contracting the object window first on the I_x axis and then on the I_y axis.

The expansion and contraction parameter, (EC_par), plays an important role in the contraction operation. Initially, the value of this parameter is greater than 1. It becomes 2 when the ratio of the object area to the total area of the object window is 50%. Further, the value becomes 3 when the ratio becomes 30%. If the expansion and contraction parameter tends to 1, this implies that the object is too large compared to the object window. When the parameter takes a value approximately 3 or 4, it implies that the object is very small compared to the object window. Thus, it is reasonable that the value of the EC_par variable is maintained around 2. When the value of EC_par is close to 1, the object window must be extended, and when it is much greater than 2, the object window must be contracted. In order to maintain the performance of the system, the appropriate EC_par value is around 1.5 to 2.

   3.3 Selection of Object by Color Information [15]

If an occlusion has occurred, then the color information of the target object just before and after the occlusion is very useful. The tracking can be successfully continued if the two objects are not identical or have similar color. The study on the occlusion can be divided into two kinds. The one is using color and shape information [15] and other is movement information of target object by using particle filer or Kalman filter algorithm [16, 20]. However, when both the objects have identical or similar colors, and are of similar shape, then the tracking may fail. Such a scenario requires further investigation.

In order to solve this problem, this paper uses both information, i.e., color and shape of the target object and velocity information. Figure 6 shows an occlusion (a and b) occurring just before (c) and after (d). The middle and the bottom figure of (a) is I_x and I_y respectively. Middle and bottom figure of (b), (c) and (d) are I_x and I_y respectively, but each of which are computed by using color matrix, i.e., RGB matrices. Bottom figure of (b), (c) and (d) shows very similar pattern, even the position of two objects A and B are exchanged. But middle figure of (b), (c) and (d) shows different shape each other. Also two objects can be separate about 275 pixels for (c) and about 265 pixels for (d). Separated objects can be identified by using color distribution or shape.

[Figure 6.] When occlusion has occurred (a, b), just before (c) and after (d).

4. Experimental Results and Discussion

To verify the validity of the algorithm presented in this paper, several experiments were performed using mobile images provided by CAVIAR [17]. The first experiment involved tracking one person walking from the bottom-right corner of the lobby towards the top left corner. The second experiment involved tracking two people walking in opposite directions and one person walking in a crowd. The last experimental scenario involves tracking three people walking together and another person walking in the opposite direction.

   4.1 Scenario 1: Tracking One Person Walking in the Lobby

The first experiment involves tracking one person walking from the bottom right corner of the lobby towards the top left corner. The tracking results of this experiment are shown in Figure 7. Each frame in this figure is selected from the 10-frame steps. The calculated target positions for each frame are marked by “*”. As shown in Figure 7, the target tracking is performed successfully and the proposed algorithm works well.

[Figure 7.] Tracking result for one people. Each frame in this figure is selected from the 10-frame steps.

   4.2 Scenario 2: Tracking Two PeopleWalkingWith Other People

The second scenario consists of tracking two people walking in opposite directions and one person walking in a crowd. In Figure 8, the tracking procedure is shown by 13-frame intervals. In each image, the person walking in the upward direction is marked by a red cross. Further, the person walking with a group of three people from the center in the downward direction is marked by a yellow cross. As shown by the second row and fourth row, accurate tracking is performed even when these two people approach very closely.

[Figure 8.] Tracking result for two people walking in opposite directions.

   4.3 Scenario 3: Tracking Three People Walking With Other People

The third experimental scenario is to track three people walking together and another person walking in the opposite direction. This scenario is the same as scenario 2, except that one person is added to the target. It is known by this scenario that the computational complexity increases in comparison with scenario 2. However, it does not significantly affect the run-time. This procedure is shown in Figure 9.

[Figure 9.] Tracking results for three people when four people are walking. Three people are walking together, but one is walking in the opposite direction.

5. Conclusion

This paper investigated multi-human tracking in an indoor environment and presented a modified E-C method. The proposed algorithm provides the advantages of the mean-shift algorithm as well as the useful properties of particle swam optimization and filter-based algorithms for multi-object tracking. Some useful new variables were defined, such as object window, E-C parameter (i.e. the ratio of the object area to the object window area), I_x, defined as the distribution of non-zero pixels in the horizontal direction (x-direction), and I_y, defined as the distribution of non-zero pixels in the vertical direction (y-direction). The center of mass for a human object is computed using I_x and I_y. To show that the proposed object tracking method can be efficiently applied to a variety of environment, several experiment were carried out. As stated in the experimental section, the proposed method works well for every scenario. As the computational load is very low, the proposed method will be useful for more complex environments as well. However, in case of two objects having identical or similar colors, and similar shape, the tracking may fail, and such a scenario requires further research.

Conflict of Interest

No potential conflict of interest relevant to this article was reported.

참고문헌

1. Yilmaz A., Javed O., Shah M. 2006 “Object tracking,” [ACM Computing Surveys] Vol.38
2. Comaniciu D., Ramesh V., Meer P. 2003 “Kernel-based object tracking,” [IEEE Transactions on Pattern Analysis and Machine Intelligence] Vol.25 P.564-577
3. Takala V., Pietikainen M. 2007 “Multi-object tracking using color, texture and motion,” [Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition]
4. Khan S. M., Shah M. 2009 “Tracking multiple occluding people by localizing on multiple scene planes,” [IEEE Transactions on Pattern Analysis and Machine Intelligence] Vol.31 P.505-519
5. Arulampalam M.S., Maskell S., Gordon N., Clapp T. 2002 “A tutorial on particle filters for online nonlinear non-Gaussian Bayesian tracking” [IEEE Transactions on Signal Processing] Vol.50 P.174-188
6. Hue C., Le Cadre J. P., Perez P. 2002 “Tracking multiple objects with particle filtering,” [IEEE Transactions on Aerospace and Electronic Systems] Vol.38 P.791-812
7. Maskell S., Gordon N. 2001 “A tutorial on particle filters for on-line nonlinear/non-Gaussian Bayesian tracking,” [IEE Target Tracking: Algorithms and Applications (Ref No. 2001/174)] P.2/1-2/15
8. Kwon J., Lee K.M., Park F.C. 2009 “Visual tracking via geometric particle filtering on the affine group with optimal importance functions,” [IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops] P.991-998
9. Comaniciu D., Meet P. 1999 “Mean shift analysis and applications,” [Proceedings of the 1999 7th IEEE International Conference on Computer Vision] P.1197-1203
10. Comaniciu D., Ramesh V. 2000 “Mean shift and optimal prediction for efficient object tracking,” [International Conference on Image Processing] P.[d]70-[d]73
11. Vigus S. A., Bull D. R., Canagarajah C. N. 2001 “Video object tracking using region split and merge and a Kalman filter tracking algorithm,” [Proceedings of the International Conference on Image Processing] P.650-653
12. Czyzewski A., Dalka P. 2008 “Examining Kalman Filters Applied to Tracking Objects in Motion,” [9th International Workshop on Image Analysis for Multimedia Interactive Services] P.175-178
13. Zhang X., Hu W., Maybank S., X. Li, Zhu M. 2008 “Sequential particle swarm optimization for visual tracking,” [Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition]
14. Jiang H., Fels S., Little J. J. 2007 “A linear programming approach for multiple object tracking,” [Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition]
15. Huang Y., Essa I. 2005 “Tracking multiple objects through occlusions,” [Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition] P.1051-1058
16. Ko K. E., Park J. H., Park S. M., Kim J. Y., Sim K. B. 2012 “Occluded object motion estimation system based on particle filter with 3D reconstruction,” [International Journal of Fuzzy Logic and Intelligent Systems] Vol.12 P.60-65
17. 0000 “CAVIAR: Context Aware Vision using Image-based Active Recognition,”
18. Kang J. S. 2013 “A new mobile object tracking approach in video surveillance. Part I: Indoor environment,” [The 14th International Symposium on Advanced Intelligence Systems] P.1097-1102
19. Kim S. W., Kang J. S. 2013 “A new mobile object tracking approach in video surveillance. Part II: Outdoor environment,” [The 14th International Symposium on Advanced Intelligence Systems] P.1103-1108
20. Park S. M., Park J. H., Kim H. B., Sim K. B. 2011 “Specified object tracking problem in an environment of multiple moving objects,” [International Journal of Fuzzy Logic and Intelligent Systems] Vol.11 P.118-123

이미지 / 테이블

[ ]
[ ]
[ ]
[ ]
[ Figure 1. ] (a). Group of peoplewalking together (left), corresponding Ix (top-right) and Iy (bottom-right). (b). The stripped image on y-coordinate (top), corresponding Ix (center) and Iy (bottom).
[ Figure 2. ] (a) The original image frame (top-left), binary image (top-right), Ix (middle) and Iy (bottom). (b) The strip image (top-left), the final objectwindow(bottom-left), Ix (top-right) and Iy (second-right) for strip image and Ix (third-right) and Iy (bottom-right) for the final object window.
[ Figure 3. ] Overview of the system flow.
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ Figure 4. ] Background image (top-left), base frame (top-right), kth frame (bottom-left), and (k+1)th frame (bottom-right) of the expansion and contraction procedure of a person walking at a 60-frames interval are shown above.
[ Figure 5. ] Expansion and contraction of object window (top-right), same operation on Ix (middle), and same operation on Iy (bottom).
[ Figure 6. ] When occlusion has occurred (a, b), just before (c) and after (d).
[ Figure 7. ] Tracking result for one people. Each frame in this figure is selected from the 10-frame steps.
[ Figure 8. ] Tracking result for two people walking in opposite directions.
[ Figure 9. ] Tracking results for three people when four people are walking. Three people are walking together, but one is walking in the opposite direction.