Laser Spot Detection Using Robust Dictionary Construction and Update
 Author: Wang Zhihua, Piao Yongri, Jin Minglu
 Publish: Journal of information and communication convergence engineering Volume 13, Issue1, p42~49, 31 March 2015

ABSTRACT
In laser pointer interaction systems, laser spot detection is one of the most important technologies, and most of the challenges in this area are related to the varying backgrounds, and the realtime performance of the interaction system. In this paper, we present a robust dictionary construction and update algorithm based on a sparse model of background subtraction. In order to control dynamic backgrounds, first, we determine whether there is a change in the backgrounds; if this is true, the new background can be directly added to the dictionary configurations; otherwise, we run an online cumulative average on the backgrounds to update the dictionary. The proposed dictionary construction and update algorithm for laser spot detection, is robust to the varying backgrounds and noises, and can be implemented in real time. A large number of experimental results have confirmed the superior performance of the proposed method in terms of the detection error and realtime implementation.

KEYWORD
Background subtraction , Laser spot detection , Dictionary construction and update , Compressive sensing

I. INTRODUCTION
Recently, we have witnessed a growing interest in laser pointer interaction (LPI), which allows users to interact directly from a distance through a laser pointer. In laser pointerbased interaction systems, the captured laser spot is recognized and used for interactions by using various image processing techniques. The advantage of ensuring movement flexibility for users has led to the widespread use of this method for multimedia presentations [14], robot navigation [57], medical purposes [8], virtual reality systems [9,10], and smart houses [11].
Recently, Kim et al. [2] summarized three fundamental problems with LPI: laser spot detection, interaction function, and coordinate mapping. In [1113], the researchers focused on the development of a laser spot detection algorithm that directly influences the performance of LPI systems. The most difficult challenges of laser spot detection are strong light environments, realtime implementation, and dynamic backgrounds. For example, the background information always changes when the speaker turns the slides in practical presentation cases.
To overcome the above mentioned problems, two types of algorithms, namely target search (TS) and background subtraction (BGS), have been developed to detect a laser spot. The TS method directly searches the laser spot without considering the background. Shin et al. [12] simply searches for pixels with maximum intensities to detect the location of the laser spot. Chávez et al. [11] used a combination of template matching and fuzzy rulebased systems to improve the success rate of laser spot detection. Geys and Van Gool [13] determined the laser spot by using clusters along with the fact that a group effect is caused on laser spots by hand jitters. However, the TS method fails because of the strong light environment and the appearance change of the moving laser spot. On the other hand, BGS covers a set of methods that aim to distinguish between the foreground and the background areas by utilizing a background model. The traditional models used to represent background include statistical models, neural networks, estimation models, and some recent models including fuzzy models, subspace models, transform domain models, and sparse models [14]. Among them, sparse models have been successfully applied in compressive sensing [15]. Cevher et al. [16] considered background subtraction as a sparse approximation problem and provided different solutions based on convex optimization. Hence, the background is learned and adapted in a lowdimensional compressed representation, which is sufficient to determine spatial innovations. Huang et al. [17] proposed a new learning algorithm called dynamic group sparsity (DGS). The idea is that the nonzero coefficients in the sparse data are often not random but tend to be a cluster such as those in the case of foreground detection. However, the dictionary of backgrounds is constructed simply by using video frames that make this model sensitive to noise and background changes. In order to solve the problem of background changes and outliers in training samples, Zhao et al. [18] formulated background modeling as a dictionary learning problem. However, the learning process is time consuming and needs all the background information, which makes it difficult to apply in practice. Therefore, to solve the problem discussed in [18], we propose a novel robust algorithm for the construction and update of a dictionary for laser spot detection. Subsequently, the proposed model can control the varying backgrounds and the realtime performance.
The remainder of this paper is organized as follows: Section II briefly explains the proposed method of background modeling and foreground detection. In Section III, we show the experimental results in comparison with those of the existing methods, and some conclusions of the proposed method are presented in Section IV.
II. THE PROPOSED SYSTEM MODEL
Suppose that we have an image
Y of sizen _{1} ×n _{2} and we vectorize it into a column vectory of sizen × 1 (n =n _{1} ×n _{2}) by concatenating the individual column ofY in the order from first to last. We formulate the background subtraction as a linear decomposition problem, i.e., to find a background componenty_{B} and a foreground componenty_{F} that together constitute a given framey :where
y_{B} andy_{F} denote the column vectors of background a nd foreground, respectively.> A. Sparse Representation
Suppose that we have
K different backgroundsy _{B1},y _{B2},...,y_{BK} ∈R^{n} ; then, we can buildK configurations for dynamic backgrounds with each configuration standing for one background. Therefore, at a specific frame, the backgroundy_{B} can choose from one of these configurations. We define a new matrixD = [d _{1},d _{2},...,d_{K} ] as the concatenation of all the configurations; here,d_{i} denotes thei^{th} configuration. Then, we say that backgroundy_{B} has the linear representationy_{B} =d_{i}x_{i} , wherex_{i} denotes a coefficient representing the relationship betweeny_{B} andd_{i} . Thus, the background can be modeled as a sparse linear combination of atoms from a dictionaryD , each atom of which characterizes one of the configurations. Next, we rewritey_{B} in terms ofD as follows:where
x = [0,...,0,x_{i} ,0,...,0]^{T} denotes a sparse coefficient vector whose entries are ideally zeros except at positions associated withx_{i} .Zhao et al. [18] summarized two assumptions for this sparse model:
Assumption 1 . Backgroundy_{B} of a specific framey has a sparse representation over a dictionaryD .Assumption 2 . The candidate foregroundy_{F} of a frame is sparse after background subtraction.On the basis of these two assumptions, the BGS problem can be interpreted as follows: given a frame
y , find a decomposition that has the sparse coded backgroundy_{B} =D_{x} and the sparse foregroundy_{F} =y D_{x} :where ║
x ║_{0} denotes theℓ _{0}norm counting the number of nonzero elements ofx ,D indicates the dictionary capturing of all the background configurations, andλ represents the weighting parameter balancing between the two terms.Since Eq. (3) is an NPhard problem because of the nonconvexity of
ℓ _{0}norm, Zhao et al. [18] replacedℓ _{0}norm withℓ _{1}norm and obtained theℓ _{1}measured andℓ _{1}regularized convex optimization problem:Considering the LPI application, the foreground (laser spot) generally occupies a far smaller spatial area than the background. Therefore, we can simply treat the foreground as noises and obtain a Lasso problem:
This problem can be easily and rapidly solved using least angle regression (LARS) [19], and then, we can obtain the foreground using
> B. Dictionary Construction
To make the sparse model robust against dynamic backgrounds, the dictionary must be able to represent all the backgrounds. Huang et al. [17] assumed that background subtraction has already been performed on the first
K frames of the video sequences and letD = [y_{1},y_{2},...y_{K}] ∈ R^{n×K}. It is noteworthy that this method is sensitive to noise and cannot be used in practice. Zhao et al. [18] collected all background training samples and developed a robust dictionary learning approach to construct the dictionary:However, in LPI applications, we are unable to collect a sufficient number of training samples. For example, we are unable to capture a large number of backgrounds in a presentation application since we do not know the information of the next slide until the user gives the ‘PageDown’ or ‘PageUp’ command. Besides, solving this optimization problem is time consuming and the solution is difficult to implement in realtime.
Since the use of video sequences as a dictionary is sensitive to noise, we use information from multiple frames for ensuring robustness. Therefore, the strategy is to apply an exponentially decaying weight to run an online cumulative average on the backgrounds:
where
α denotes the decay rate often chosen as a tradeoff between stability and quick update andK represents the parameter that controls the number of backgrounds. The advantage of this approach apart from its simplicity is that it can suppress noise and solve the problem lowfrequency background changes to some extent. We assume that the background changes at a high frequency at the dictionary update stage but not the dictionary construction stage, which is often true in an LPI application.> C. Dictionary Update
The dictionary needs to update quickly in order to handle the occurrence of a new background. Huang et al. [17] set a time window to update the dictionary. For frame
t ,the dictionary is updated byD = [y_{tK},...,y_{t2},y_{t1}]. However, this method is still sensitive to noise, which makes the model unstable. Zhao et al. [18] updated the dictionaryD by solving the following optimization problem with the coefficients being updated and considered constant:Zhao et al. [18] assumed that the atoms in
D are independent of each other and thus, updated each of them separately. However, solving this optimization problem is still time consuming.Considering that when a new background occurs, the foreground
y_{F} solved by Eqs. (5) and (6) will not be a sparse result, we can figure out whether a new background occurs by setting a threshold for theℓ _{0}norm ofy_{F} . Whenever a new background occurs, we add the new background configuration into the dictionary; otherwise, we directly update the dictionary by using Eq. (8). This method can be formulated as follows:Where
Th can be set as the size of the laser spot.The proposed strategy is made sensitive to changing backgrounds by adding new background configurations, and robust against noise by using the online cumulative average of the backgrounds. The proposed dictionary construction and update algorithm is summarized in Table 1.
III. EXPERIMENTS AND RESULTS
To validate the ability of the proposed algorithm to handle the above mentioned highfrequency background changes and evaluate the algorithm’s realtime performance, in this section, we discuss two experiments of LPI. Through these experiments, we evaluated the performance of the proposed algorithm with the different parameters used in this algorithm, measured the detection error under dynamic backgrounds, and compared it with the running times of different algorithms as well.
> A. Laser PointerOperated Windows
A typical example of LPI in practice is the interactive demonstration of software with a computer whose screen content is sent to a video beamer by using a common laser pointer tracked by a video camera as an input device. Algorithms use the behavior of the laser spot to realize the functions of Button Press, Button Release, and Mouse Move. When Button Press is recognized, the corresponding file or dialog may show up, which leads to a background change immediately. We record three videos of the size 160×120, 320×240, and 640×480, respectively, to simulate this process on such a system.
In LPI, the laser spot cannot be static because of the hand jitter, thus instead of measuring the detection error compared with the ground truth, we validate it using the possibility of false detected frames as follows:
The performance of the proposed algorithm is compared with that of two algorithms representing stateoftheart sparse model approaches [17,18]. Notice that we use LARS [19] to solve Eq. (5) for all these methods in order to evaluate the dictionary construction and update approach. Fig. 1 illustrates some results of the abovementioned algorithms.
Image sequences having a size of 320×240 are used to test how the parameters
λ andα determine the detection performance. The detection errors of different parameter values are shown in Fig. 2. As we can see from Fig. 2, a larger weighting parameterλ is helpful for the detection since the sparsity of the background is the key assumption of the proposed algorithm. However, a considerably largeλ value increases the reconstruction error, which leads to relatively low performance. Thus, the value ofλ can be chosen from 5 to 10 in order to obtain good performance. The decay rateα is used against noises; a smallα value is sensitive to noises, and a large one cannot adapt to a low frequency of background changes.As can be observed in Fig. 2, a moderate
α value of 0.5 can lead to better performance. In our experiments, the weighting parameter was set atλ = 5 and the decay rate atα = 0.5 .As the other parameter values used in these tests, we select
K = 20 to build the dictionary andTh = 50 to control the sparsity of the laser spot. A standard PC with a 2.0GHz Intel CPU processor and 3 GB of memory is used in our experiments. As can be seen from Fig. 1, our algorithm can handle a situation that has dynamic backgrounds and is robust against noise. The final results of the detection error defined by Eq. (11) and the running time per frame are illustrated in Fig. 3. As can be observed, our algorithm achieves detection errors that are as low as those of the dictionary learning approach and consumes as little time as the using video images as dictionary method. Notice that the detection error of the using video images as dictionary method [17] is considerably higher than that of our algorithm, and that dictionary learning [18] consumes a considerably large amount of time and thus, cannot be implemented in real time.> B. Multimedia Presentation
In a presentation application, we can use the laser pointer to change slides and draw lines. It should be noted that highfrequency changes are caused when the user changes the slides. Further, each slide may be totally different from the others. For this application, we manually change the slides to obtain dynamic backgrounds and use the above mentioned algorithms for the detection of the laser spot. The final results are shown in Figs. 4 and 5.
From Figs. 4 and 5, we can see that the proposed algorithm can achieve a lower detection error with a low time cost, which is similar to the results of the laser pointeroperated windows method. Thus, the proposed algorithm is robust against different scenarios with dynamic backgrounds. From Table 2, we can see that the detection error when the image resolution 160×120 is the highest, while similarly low detection errors are obtained when the resolutions of 320×240 and 640×480 are used. However, the time cost of using the resolution of 640×480 is considerably higher than that of using the resolution of 320×240. Thus, we recommend the use of the 320×240 resolution in practice.
IV. CONCLUSION
In this paper, we focus on the laser spot detection algorithm and model it as a background subtraction problem. Further, we propose a robust dictionary construction and update algorithm based on the sparse model for laser spot detection. To test the performance of the proposed method, a large number of experiments are conducted from the perspectives of detection error and realtime performance. The experimental results confirm that the proposed method outperforms the existing methods with a lower detection error and better realtime performance when the background exhibits a high frequency of changes.
Finally, the proposed robust algorithm can also be applied to solve other practical problems, such as traffic monitoring [18] where the background switches among several configurations controlled by the status of traffic lights.

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[Table 1.] Description of the proposed dictionary construction and update algorithm

[]

[Fig. 1.] Results on laser pointeroperated Windows. (a) Original image (size: 320×240). (b) Using video images as dictionary [17]. (c) Dictionary learning method [8]. (d) Proposed method.

[Fig. 2.] Detection error with different parameters λ and α.

[Fig. 3.] Results on laser pointeroperated Windows. (a) Detection errors. (b) Running time.

[Fig. 4.] Results of multimedia presentation. (a) Original image (size: 320×240). (b) Using video images as dictionary [17]. (c) Dictionary learning method [18]. (d) Proposed method.

[Fig. 5.] Results of multimedia presentation. (a) Detection errors. (b) Running time.

[Table 2.] Performance comparison of different image resolutions