MultiFrame Face Classification with DecisionLevel Fusion based on PhotonCounting Linear Discriminant Analysis
 Author: Yeom Seokwon
 Publish: International Journal of Fuzzy Logic and Intelligent Systems Volume 14, Issue4, p332~339, 25 Dec 2014

ABSTRACT
Face classification has wide applications in security and surveillance. However, this technique presents various challenges caused by pose, illumination, and expression changes. Face recognition with longdistance images involves additional challenges, owing to focusing problems and motion blurring. Multiple frames under varying spatial or temporal settings can acquire additional information, which can be used to achieve improved classification performance. This study investigates the effectiveness of multiframe decisionlevel fusion with photoncounting linear discriminant analysis. Multiple frames generate multiple scores for each class. The fusion process comprises three stages: score normalization, score validation, and score combination. Candidate scores are selected during the score validation process, after the scores are normalized. The score validation process removes bad scores that can degrade the final output. The selected candidate scores are combined using one of the following fusion rules: maximum, averaging, and majority voting. Degraded facial images are employed to demonstrate the robustness of multiframe decisionlevel fusion in harsh environments. Outoffocus and motion blurring pointspread functions are applied to the test images, to simulate longdistance acquisition. Experimental results with three facial data sets indicate the efficiency of the proposed decisionlevel fusion scheme.

KEYWORD
Decisionlevel fusion , Data fusion , Face classification , Face recognition , Photoncounting , Linear discriminant analysis

1. Introduction
Face classification has many applications in security monitoring and intelligent surveillance, as well as robot vision, image and video retrieval, and humanmachine interfaces [13]. However, it is challenging to classify a facial image acquired in an uncontrolled setting, such as those captured at long distances. Unexpected blurring and noise may occur, in addition to conventional distortions caused by pose, illumination, and expression changes. To address these issues, various classifiers have been developed based on statistical analysis, including Fisher linear discriminant analysis (LDA) combined with principal component analysis (PCA) [4], (often referred to as “Fisherfaces”), as well as the “Eigenfaces” method, which uses only PCA [1]. Typically, the number of training images is much less than the number of pixels. Thus, the Fisher LDA requires a dimensionality reduction such as PCA in order to avoid the singularity problem, often referred to as the “small sample size problem.” However, photoncounting (PC) LDA does not suffer the singularity problem associated with a small sample size [5]. Originally, PCLDA had been developed to train grayscale images and classify a photonlimited image obtained under low illumination. However, it has been shown that PCLDA is also suitable for classifying grayscale images, which can be obtained by a visible camera [6].
Decisionlevel fusion is a highlevel data fusion technique [7, 8]. It aims to increase classification accuracy by combining multiple outputs from multiple data sets. Compared to single frames, multiframes contain additional information acquired from varying spatial or temporal settings, as illustrated in Figure 1. Various fusion rules such as maximum, averaging, and majorityvoting rules have been studied in the literature [9, 10]. Bayesian estimation and DempsterShafer evidential reasoning are often adopted for decisionlevel fusion [11]. In [12], preliminary results are provided for multiframe recognition with several data sets.
In this paper, multiframe decisionlevel fusion with PCLDA is discussed. Decisionlevel fusion involves three stages: score normalization, score validation, and score combination. After the scores are normalized, candidate scores are selected using a screening process (score validation). Subsequently, the scores representing the classes are combined to render a final decision using a fusion rule (score combination). The validation stage screens out “bad” scores that can degrade classification performance. The maximum, averaging, and majority voting fusion rules are investigated in the experiments. Three facial image datasets (ORL, AR, Yale) [1315] are employed to verify the effectiveness of the proposed decisionlevel fusion scheme.
The remainder of the paper is organized as follows. PCLDA is discussed in Section 2. Section 3 describes decisionlevel fusion. The experimental results are presented in Section 4. The conclusion follows in Section 5.
2. Photon Counting LDA
This section briefly describes PCLDA. PCLDA realizes the Fisher criterion using the Poisson distribution, which characterizes the semiclassical photodetection model [16]. A PC vector y is a random feature vector corresponding to a normalized image vector x. Thus, the dimensions of x and y are the same value, which is the number of pixels
d ;y_{i} is thei th component of y, and it follows the independent Poisson distribution with the parameterN_{p}x_{i} , that is,y_{i} ∼ Poisson(N_{p}x_{i} ). It is noted thatx_{i} is the normalized intensity at a pixeli such that , andN_{p} indicates the total number of average photocounts because the following equation is valid: .The betweenclass covariance measures the separation of classes as
where the classconditional mean and the mean vectors are derived as
_{j} =µ _{y}N_{p} _{xj} andµ =µ _{y}N_{p} , respectively;µ _{x}j indicates a class, and superscriptt denotes a matrix transpose. The withinclass covariance matrix measures the concentration of members in the same class aswhere diag(·) denotes a diagonal matrix. Thus, the following Fisher criterion can be derived:
where the column vectors of
W_{P} are equivalent to the eigenvectors of corresponding to the nonzero eigenvalues. It is noted that is nonsingular because of the nonzero components of .µ _{x}The class decision can be made by maximizing a score function, as follows:
where
C is the number of classes. The normalized correlation is adopted as a score function:The photocounting vector y_{u} of an unlabeled object is required for class decisions, as depicted in Eq. (5). Alternatively, y_{u} can be estimated with the intensity image vector x_{u}. Because the minimum meansquared error (MMSE) estimation is the conditional mean [17], a point estimation of
y_{ui} becomesE (y_{ui} x_{ui} ) =N_{p}x_{ui} , wherey_{ui} andx_{ui} are thei th component of y_{u} and x_{u}, respectively. Thus, Eq. (5) is equivalent to the following score function:The meansquared (MS) error is the same as the variance of
y_{ui} , which isN_{p}x_{ui} . The MS error increases asN_{p} increases; however, the PCLDA converges to the Fisher LDA asN_{p} goes to the infinite asTwo performance measures are calculated to evaluate the performance of the classifiers. One is the probability of correct decisions (
P_{D} ), and the other is the probability of false alarms (P_{FA} ) [6]:3. DecisionLevel Fusion
Decisionlevel fusion is composed of three stages: score normalization, validation, and fusion rule processes; these are illustrated in Figure 2. The scores must be normalized if they are presented in different metric forms. The candidate scores are selected during the validation process. Finally, they are combined to create a new score, using a fusion rule. For the score validation, a score set S_{k} is composed of n_{k} scores selected from the output scores of a frame k as follows:
where
K is the total number of frames.S _{1},. . . ,S _{K} score sets are then reassigned to new sets as follows:where is the number of scores for class
j from allK frames. Therefore, and are held between the setsS_{k} and . The following three fusion rules are adopted to compute the final score for classj :where Eqs. (13)(15) represent maximum, averaging, and majority voting rules, respectively.
4. Experimental and Simulation Results
This section describes two types of experiments. The first involves the verification of PCLDA with a single frame. In the second experiment, decisionlevel fusion is tested with artificially degraded test images.
4.1 Face Classification
Three facial image datasets were used for the performance evaluation: ORL [13], AR, [14], and Yale [1]. The MATLAB format was utilized for the Yale database [15]. Figure 3 shows the sample images of five classes from three datasets. The datasets contain 40, 100, and 15 classes, respectively; these classes respectively contain 10, 26, and 11 images. The dataset image sizes are 92 × 112, 120 × 165, and 64 × 64 pixels, respectively. Each database was divided into three validation sets, as shown in Table 1. For the singleframe experiment, each validation set was trained and all other validation sets were tested. For example, when three images (image indexes 1–3) in set V_{1} of the ORL dataset were trained, the other seven images (image indexes 4–10) were tested. Figures 4 represents the five column vectors of the PCLDA face, Fisherface, and Eigenface projection matrices, respectively, in the image scale; three images from set V_{1} of the ORL dataset were trained to produce these results. As illustrated in the figures, the PCLDA face presents the optimal structural diversity among the three classifiers, although the Eigen face method is more dependent on the intensity distribution, compared to the other methods. Figure 5 shows the average probability of detection (
P_{D} ) and average probability of a false alarm (P_{FA} ) when each validation set is trained and other images are tested as a single frame. The results are compared with the Fisherface and Eigenface methods.4.2 DecisionLevel Fusion
For the decisionlevel fusion experiment, test images were blurred by outoffocus and motion blurring pointspread functions, to simulate longdistance acquisitions. Outoffocus images were rendered by applying circular averaging with an 8 pixel radius. Heavy motion blurring was rendered by a filter approximating the linear motion of a camera for a distance of 20 pixels, with an angle of 45^{◦} in a counterclockwise direction [6]. Figure 6 shows the sample test images from ORL after blur rendering.
It was assumed that one pair of test images in the validation set was obtained by multiple sensors; thus, the total number of frames (K) was set to two. For example, if the number of test images was seven in the singleframe experiment, the number of test pairs for the multiframe fusion was 21 (= _{7}
C _{2}). Figure 7 shows the averageP_{D} andP_{FA} for the ORL, AR, and YALE datasets. The maximum rule produced the optimal results for the original images; however, the majority rule produced the optimal results when the images were degraded with the blurring functions.5. Conclusions
This study investigated the effectiveness of a decisionlevel fusion system with multiframe facial images. Three decisionlevel fusion schemes were investigated, following the score normalization and validation processes. Two types of blurring pointspread functions were applied to the test images, in order to simulate harsh conditions. The results indicated that the proposed data fusion scheme improved the classification performance significantly.

[Figure 1.] Configurations of varying (a) spatial setting, (b) temporal setting.

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[]

[Figure 2.] Block diagram showing decisionlevel fusion.

[]

[]

[]

[]

[Figure 3.] Sample images from (a) ORL, (b) AR, (c) Yale.

[Figure 4.] (a) Photoncounting linear discriminant analysis face, (b) Fisherface, (c) Eigenface.

[Figure 5.] Single frame results of PD and PFA: (a) ORL, (b) AR, (c) YALE.

[Table 1.] Image index in validation sets

[Figure 6.] Sample test images from ORL: (a) original, (b) outoffocus blurring, (c) motion blurring.

[Figure 7.] Decisionlevel fusion results of PD and PFA: (a) ORL, (b) AR, (c) YALE.