A New Estimation Model for Wireless Sensor Networks Based on the Spatial-Temporal Correlation Analysis

Ren
						Xiaojun; Sug
						HyonTai; Lee
						HoonJae

doi:10.6109/jicce.2015.13.2.105

OA학술지
Journal of information and communication convergence engineering

A New Estimation Model for Wireless Sensor Networks Based on the Spatial-Temporal Correlation Analysis

DOI : 10.6109/jicce.2015.13.2.105
Author: Ren Xiaojun, Sug HyonTai, Lee HoonJae
Publish: Journal of information and communication convergence engineering Volume 13, Issue2, p105~112, 30 June 2015

ABSTRACT

A New Estimation Model for Wireless Sensor Networks Based on the Spatial-Temporal Correlation Analysis

KEYWORD

Data mining , DESM , Missing sensor data , STCAM

본문

Collapse all

I. INTRODUCTION

With the rapid development of wireless communication, microelectronics, and embedded computing technologies, sensor networks are widely used in certain fields, such as the military, environment, and medicine. Therefore, nowadays, these networks have become a popular topic of research. In a wireless sensor network, sensors always communicate with the server and other sensors (e.g., for sending data or accepting data). However, in the process of communication, we can expect the transmitted sensor data to get lost or corrupted for many reasons, such as bad weather conditions, the sensor node’s communication ability, wireless signal strength, power outage at the sensor node, or a relatively high bit error rate of the wireless radio transmissions as compared with wired communications. In general, we can re-query data or discard data, but re-querying data is a naive alternative as it may induce a long wait or quicken the power exhaustion of the node, and most importantly, it does not guarantee having the original reading available. Discarding data is also a bad choice as it may lead to the loss of some interesting data. Therefore, it is essential to develop a technique for estimating missing data.

Data mining can produce knowledge from the existing data, and this knowledge can be used for estimating the missing sensor data. However, the existing missing sensor data estimation approaches do not achieve good results (as discussed in the following section). Therefore, in this paper, we propose a new estimation model based on a spatial-temporal correlation analysis (STCA). This model can discover intrinsic relationships among sensors and then incorporate the intrinsic relationships and the spatial-temporal relationship into data estimation. Finally, STCAM is tested with data from a traffic monitoring sensor application.

II. RELATED WORKS

In fact, the topic of missing data estimation belongs to the field of statistics, and many researchers have conducted a considerable amount of research on this topic by using methods such as mean substitution, linear regression, Bayesian estimation, expectation maximization, k-nearest neighbor, and neural networks [1,2]. However, because of the characteristics of a wireless sensor network, these techniques cannot provide a good estimation of the missing sensor data. To solve the problem of missing sensor data, many techniques have been proposed.

To avoid the problem of missing sensor data, many researchers have redesigned the sensor network architecture. NASA/JPL [3] is one of the most famous architectures. In NASA/JPL, if one sensor fails, its neighboring sensors compensate for the lost data by increasing their sampling rates. This implies that there must be a tight collaboration among sensors for a sensor to know that its neighboring sensor has failed. This increases the power consumption of every sensor even during its normal operation. Further, this approach does not address how sampling rates should be adjusted in order to guarantee good QoS. It is also possible that when some neighboring sensors fail, no sampling adjustment can potentially compensate for the missing values.

Some of the researchers used association rule mining to estimate the missing sensor data. Halatchev and Gruenwald [4] proposed the WARM algorithm. In this algorithm, if sensor node a fails, WARM will find its neighbor sensor node b and use b’s data to estimate a’s missing data. WARM makes use of the sliding window concept, where only the latest w rounds of data reports are stored and used for estimation. However, the algorithm has one limitation, which is its disregard of the temporal aspect since it views all data as equally important. Gruenwald et al. [5] proposed an improved algorithm called FARM, which uses association rule mining to discover intrinsic relationships among sensors and incorporates them into the data estimation while taking data freshness into consideration. However, WARM and FARM can only be used in the case of discrete data; because most of the sensor data are numeric, WARM and FARM cannot be used widely.

III. ESTIMATION MODEL BASED ON SPATIAL-TEMPORAL CORRELATION ANALYSIS

In fact, there are two types of missing sensor data, namely single missing data elements and continuous missing data; therefore, STCAM must have the ability to provide different solutions for different types of missing data according to the sensor’s spatial-temporal correlation. Before the given STCA, we will discuss the problem description, temporal correlation algorithm (TCA), spatial correlation algorithm (SA), and spatial-temporal correlation algorithm (STCA).

  > A. Problem Description

STCAM uses a temporal series form to represent the collected data of a sensor node a_k. The temporal series form is as follows:

where T₁......T_n ∈ R denote the sampling time and V_k₁...... V_kn ∈ R represent the sampling values of sensor node a_k at time T₁......T_n. Assuming that V_ki denotes the missing sensor data and represents the estimated sensor data at time T_i, we can reduce the problem of the estimation of the missing sensor data to the calculation of the smallest value of

  > B. Temporal Correlation Algorithm

In some applications, the data of the monitoring parameter have a tight temporal correlation, such as temperature, humidity, and light intensity. Therefore, we can use temporal correlations to build the TCA model. In the next section, we will introduce two algorithms, namely the linear interpolation algorithm (TCA-LI) and multiple regression algorithm (TCA-MR).

1) TCA-LI Algorithm

The linear interpolation algorithm is a method of curve fitting using linear polynomials, which have a high efficiency. In this section, TCA-LI can be expressed by the following formula [6]:

where T_u and T_v denote the two nearest time points from T_i, and T_u < T_i < T_v ; denotes the estimated sensor data at time T_i, and V_iu and V_iv represent the sampling data at time T_u and T_v, respectively.

For a single missing data element, the TCA-LI algorithm can give a better attestation value, but if the missed sensor data are continuous, the accuracy of the TCA-LI algorithm decreases, as shown in Fig. 1. Sensor V measures the temperature every 24 minutes. Assuming that T₁₁₇₆ (V₁₁₇₆ = 32.90) is missed and that T₁₁₅₂ (V₁₁₅₀ = 33.10) and T₁₂₀₀ (V₁₂₀₀ = 32.50) are the two nearest time points from T₁₁₇₆, we find that is close to V₁₁₇₆. However, assuming the data between T₁₀₀₈ and T₁₂₇₂, we find that T₉₈₄ (V₉₈₄ = 29.80) and T₁₂₉₆ (V₁₂₉₆ = 30.30) are the two nearest time points from T₁₁₇₆ ; therefore, and the value of is very large. Hence, the TCA-LI algorithm is only used for estimating single missing data elements.

[Fig. 1.] Temperature data collected by sensor V for one day.

2) TCA-MR Algorithm

From the above section, we see that TCA-LI has good accuracy for single missing sensor data elements in TCA, but for continuous missing sensor data, TCA-LI cannot provide good estimation data. Therefore, in this section, we will introduce the multiple regression algorithm (TCA-MR) to estimate the continuous missing sensor data of the TCA model. Assuming that V_ki denotes the missing data of sensor node a_k at time T_i, the problem of estimating can be solved by using the following multiple regression formula:

where {β₀,β₁,β₂.......β_m} denote regression coefficients, which represent the contribution level for .

To estimate , we should use the training dataset to estimate the value of {β₀,β₁,β₂.......β_m}. Assuming that the training dataset is {V_ki,V_k₍_i₊₁₎,V_k₍_i₊₂₎.......V_kj}, j > i + 2m + 1. To estimate {β₀,β₁,β₂.......β_m}, we should build h linear equations (h > m + 1) that can be expressed as follows:

Let

Therefore, Eq. (3) can be rewritten as follows:

The coefficient β can be estimated by using the leastsquares approach [7], which can be expressed as follows:

After we calculate the value of coefficient β, we can use Equation (2) to estimate the continuous missing sensor data of TCA.

  > C. SCA Algorithm

For continuous missing data and loose temporal correlation parameters, the TCA algorithm cannot provide a good estimation value for the missing data. However, the SCA algorithm can discover the spatial relationship between the sensor nodes and use the discovered spatial knowledge to estimate the missing data.

Assuming that V_ki denotes the missing data of sensor node a_k at time T_i and {α₁,α₂......α_m} represent the neighbors of a_k, we find that {V₁_i,V₂_i......V_mi} represent the data values of {α₁,α₂......α_m} at time T_i. The problem of estimating V_ki can be solved by {V₁_i,V₂_i......V_mi} using the multiple regression as follows:

where {β₀,β₁,β₂.......β_m} denote regression coefficients, which represent the contribution level for .

To calculate SCA needs a dataset to estimate the value of {β₀,β₁,β₂.......β_m}. According to the solution rules of linear equations, the dataset contains at least (m + 1) groups of {V₁,V₂.......V_m}. Note that when h > m+1, the linear equations can be expressed as follows:

Let

Therefore, Eq. (3) can be represented by using matrix algebra as follows:

Hence, we can calculate the value of coefficient β as follows:

  > D. STCA Algorithm

The TCA algorithm is always used for estimating tight temporal correlations and single missing data elements. The SRA algorithm is used for estimating tight spatial correlations. However, when the temporal or spatial correlation is unknown, TCA or SCA may not give a good estimation value. To solve this problem, the STCA algorithm is proposed. This algorithm takes into account the weight of the temporal and spatial correlations; therefore, the STCA algorithm can be represented as follows:

where W_s and W_T denote the weight of respectively; represents the result of the SCA algorithm, and denotes the result of the TCA algorithm.

To obtain the optimum value of W_s and W_T, STCA calculates the residual sum of squares (RSS) as follows:

where denote the estimation error of respectively, and h represents the number of selected datasets.

Let

Therefore, this question of getting optimum value of W_s and W_T becomes a quadratic programming problem:

Therefore, we can use the least-squares approach [7] to obtain an optimal solution as follows:

where

  > E. Correlation Analysis Algorithm

We use Pearson’s product-moment correlation coefficient to measure the correlation of the output variable and the input variable. The value of ρ is between +1 and −1 (inclusive), where 1 denotes a total positive correlation, 0 represents no correlation, and −1 indicates a total negative correlation. The formula for ρ is as follows:

where y denotes the output variable; x represents the input variable; μ_x and μ_y denote the mean of x and y, respectively; and σ_x and σ_y indicate the standard deviation of x and y, respectively.

Further, 0.5 < |ρ| ≤ 1 is regarded as a high correlation, 0.3 < |ρ| ≤ 0.5 as a medium correlation, and 0.0 < |ρ| ≤ 0.3 as a low correlation.

1) Temporal Correlation Analysis

If we want to find whether the sampling data of a sensor have a temporal relationship, we should choose a training dataset for the analysis. Assuming that {V_ki,V_k₍_i₊₁₎,V_k₍_i₊₂₎,.......,V_k₍_j_-1),V_kj}, is the training dataset of sensor node a_k, we use V_kj,V_k₍_j_-1),.....,V₍_j-h₎ to denote the sampling value of a_k at time T_h. Thus, we obtain the subdataset at T_(h-1),T_(h-2),T_(h-3),..., T₁, as shown in Table 1.

[Table 1.] Dataset at time Th, T(h-1),T(h-2),..., T1

Dataset at time Th, T(h-1),T(h-2),..., T1

Therefore, we can use Eq. (12) to analyze the relationship of the dataset of T_h and the dataset of another time (T_(h-1),T_(h-2),...,T₁). Now, we can define the temporal correlation as follows:

Definition 1: In a training dataset, if the sub-dataset of T_h is highly relevant to one or more sub-datasets of another time (0.5 < |ρ| ≤ 1), the dataset of the sensor node has a high temporal correlation. If the sub-dataset of T_h is only moderately relevant to one or more sub-datasets of another time (0.3 < |ρ| ≤ 0.5), the dataset of the sensor node has a medium temporal correlation.

2) Spatial Correlation Analysis

If we want to determine whether the sampling data of a sensor have a spatial relationship, we should also choose a training dataset for the analysis. Assuming that a₍_k₊₁₎,a₍_k₊₂₎,...,a₍_k+i₎ are the nearest nodes from a_k, we obtain the values listed in Table 2.

[Table 2.] Dataset of ak, a(k+1),a(k+2),...,a(k+i) at different times

Dataset of ak, a(k+1),a(k+2),...,a(k+i) at different times

Definition 2: In a training dataset, if the sub-dataset of a_k is highly relevant to one or more other sensor node subdatasets (0.5 < |ρ| ≤ 1), the dataset of the sensor node has a high spatial correlation. If the sub-dataset of T_h is only moderately relevant to one or more other sensor node subdatasets (0.3 < |ρ| ≤ 0.5), the dataset of the sensor node has a medium spatial correlation.

  > F. Process of STCAM Decision

STCAM uses the following four algorithms: TCA-LI, TCA-MR, SCA, and STCA. If a sensor node has a tight temporal correlation and does not miss continuous data, STCAM will use the TCA-LI or TCA-MR algorithm to estimate the missing sensor data. Further, if a sensor node has a high spatial correlation, STCAM will use SCA to estimate the missing data. Otherwise, STCAM will choose the STCA algorithm. The process of STCAM decision making is shown in Fig. 2. From these figures, we can conclude the applicable conditions of the four algorithms.

[Fig. 2.] Process of STCAM decision. STCAM: a model based on spatialtemporal correlation analysis, TCA: temporal correlation analysis, TCA-MR: multiple regression algorithm of TCA, TCA-LI: linear interpolation algorithm of TCA, SCA: spatial correlation analysis, STCA: spatial-temporal correlation analysis.

TCA-LI: The training dataset has a high temporal correlation when the type of missing sensor data is single. In contrast, the training dataset has a medium temporal correlation when the training dataset has a low spatial correlation and the type of missing sensor data is single.

TCA-MR: The training dataset has a high temporal correlation, and the type of missing sensor data is continuous. The training dataset has a medium temporal correlation when the training dataset has a low spatial correlation and the type of missing sensor data is continuous.

SCA: The training dataset has a medium temporal correlation when the training dataset has a high spatial correlation. The training dataset has a low temporal correlation when the training dataset has a low spatial correlation, and the training dataset has a low temporal correlation when the training dataset has a medium spatial correlation.

STCA: The training dataset has a medium temporal correlation when the training dataset has a medium spatial correlation.

If the training dataset has a low spatial and temporal correlation, there is no matching algorithm for the estimation.

IV. SIMULATION EXPERIMENTS

The estimation model proposed in this paper is simulated using Java and evaluated over the Intel lab dataset [8] and a traffic dataset of a city in China. The Intel lab dataset is a trace of readings from 54 sensor nodes deployed in the Intel Research Berkeley lab. These sensor nodes collected the light, humidity, temperature, and voltage readings once every 30 seconds. The traffic dataset is a trace of readings from 596 sensor nodes that are deployed on different roads.

To evaluate the accuracy and performance of STCAM, we choose DESM [9] for a comparison. The DESM algorithm is also an estimation approach based on the spatial-temporal correlation, and the result formula is as follows:

where V_k₍_i_-1) denotes the value of sensor node a_k at (i -1) time, V_zi represents the value of a_z at time i, β denotes the weight of V_zi, and DESM chooses a_z as the nearest node from a_k (for a detailed description of DESM, refer to [9]).

To evaluate the four abovementioned algorithms, we need to choose different datasets for testing.

   1) Comparison between TCA-LI/TCA-MR and DESM

By analyzing the temporal correlation, we know that the temperature dataset has a high temporal correlation; therefore, we use the temperature dataset of sensor 23 to test the accuracy and performance of TCA-LI/TCA-MR. Firstly, we assume that the 121^th, 131^th, 141^th,..., 311^th data elements are missed; therefore, under this condition, STCAM chooses the TCA-LI algorithm to estimate the missing sensor data. The experiment results are presented in Fig. 3 and Table 3.

[Fig. 3.] Comparison of experimental results of TCA-LI and DESM. TCA-LI: linear interpolation algorithm of temporal correlation analysis, DESM: data estimation using statistical model.

[Table 3.] Performance comparison of TCA-LI and DESM

Performance comparison of TCA-LI and DESM

We assume that data 121–140 are missing; therefore, under this condition, STCAM chooses the TCA-MR algorithm to estimate the missing sensor data. The experiment results are presented in Fig. 4 and Table 4.

[Fig. 4.] Comparison of experimental results of TCA-MR and DESM. TCA-LI: linear interpolation algorithm of temporal correlation analysis, DESM: data estimation using statistical model.

[Table 4.] Performance comparison of TCA-MR and DESM

Performance comparison of TCA-MR and DESM

According to Fig. 4, the accuracy of TCA-MR decreases with an increase in the amount of continuous missing sensor data. Therefore, there is a threshold. If the amount of missing sensor data is less than the threshold, TCA-MR exhibits good accuracy, and if the amount of missing sensor data is more than the threshold, TCA-MR is not suitable for estimating the missing sensor data with a high temporal correlation. According to Table 4, the performance of TCA-MR is significantly higher than that of DESM.

   2) Comparison between STCA and DESM

By analyzing the temporal and spatial correlation, we know that the humidity dataset has a medium temporal and spatial correlation. Under this condition, STCAM chooses the STCA algorithm to estimate the missing sensor data. We choose the humidity dataset of sensor 11 to test the accuracy and performance of STCA. Assuming that data 81–105 are missing, we obtain the experimental results shown in Fig. 5 and Table 5.

[Fig. 5.] Comparison of experimental results of STCA and DESM. STCA: spatial-temporal correlation analysis, DESM: data estimation using statistical model.

[Table 5.] Performance comparison of STCA and DESM

Performance comparison of STCA and DESM

According to Fig. 5 and Table 5, STCA exhibits better accuracy, but the performance is lower than that of DESM.

   3) Comparison between SCA and DESM

By analyzing the temporal and spatial correlation, we find that the traffic dataset has a low temporal correlation and a high spatial correlation. Therefore, under this condition, STCAM chooses the SCA algorithm to estimate the missing sensor data. We suppose that data 121–144 of sensor a₆ are missing. The experimental results are shown in Fig. 6 and Table 6.

[Fig. 6.] Comparison of experimental results of SCA and DESM. SCA: spatial correlation analysis, DESM: data estimation using statistical model.

[Table 6.] Performance comparison of SCA and DESM

Performance comparison of SCA and DESM

According to Fig. 6 and Table 6, SCA exhibits better accuracy, but the performance is lower than that of DESM.

V. CONCLUSION

In this paper, we propose a data estimation technique called STCAM, which can discover the correlation of the training dataset, and depending on this correlation and the type of missing sensor data, STCAM can choose one of the most suitable algorithms from SCA-LI, SCA-MR, TCA, and STCA to estimate the missing sensor data. From the simulation result, we conclude that STCAM exhibits good accuracy for the missing sensor data, but in terms of performance, STCAM has a relatively low computational efficiency. Therefore, STCAM can only be deployed at the sink node or in the central server. Moreover, by the simulation, we found that the accuracy of TCA-MR decreases with an increase in the amount of continuous missing sensor data, and this may influence the total accuracy of STCAM, but in the paper, we do not provide an effective solution for this issue. Therefore, in the future, we will conduct further research to fill the gap.

참고문헌

1. Pan L., Gao H., Gao H., Liu Y. 2014 “A spatial correlation based adaptive missing data estimation algorithm in wireless sensor networks,” [International Journal of Wireless Information Networks] Vol.21 P.280-289
2. Niu K., Zhao F., Qiao X. 2013 “A missing data imputation algorithm in wireless sensor network based on minimized similarity distortion,” [in Proceedings of the 6th International Symposium on Computational Intelligence and Design (ISCID)] P.235-238
3. Ramakrishnan S. 2003 “Sensing the world,” [Jasubhai Digital Media] Vol.10 P.26-28
4. Halatchev M., Gruenwald L. 2005 “Estimating missing values in related sensor data streams,” [in Proceedings of the 11th International Conference on Management of Data (COMAD)] P.83-94
5. Gruenwald L., Chok H., Aboukhamis M. 2007 “Using data mining to estimate missing sensor data,” [in Proceedings of 7th IEEE International Conference on Data Mining Workshops] P.207-212
6. Yarman B. S., Kilinc A., Aksen A. 2004 “Immitance data modelling via linear interpolation techniques: a classical circuit theory approach,” [International Journal of Circuit Theory and Applications] Vol.32 P.537-563
7. Kanamori T., Hido S., Sugiyama M. 2009 “A least-squares approach to direct importance estimation,” [Journal of Machine Learning Research] Vol.10 P.1391-1445
8. Madden S. Intel lab data [Internet]
9. Li Y., Ai C., Deshmukh W. P., Wu Y. 2008 “Data estimation in sensor networks using physical and statistical methodologies,” [in Proceedings of 28th International Conference on Distributed Computing Systems (ICDCS'08)] P.538-545

OAK XML 통계

이미지 / 테이블

[ ]
[ ]
[ Fig. 1. ] Temperature data collected by sensor V for one day.
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ Table 1. ] Dataset at time Th, T(h-1),T(h-2),..., T1
[ Table 2. ] Dataset of ak, a(k+1),a(k+2),...,a(k+i) at different times
[ Fig. 2. ] Process of STCAM decision. STCAM: a model based on spatialtemporal correlation analysis, TCA: temporal correlation analysis, TCA-MR: multiple regression algorithm of TCA, TCA-LI: linear interpolation algorithm of TCA, SCA: spatial correlation analysis, STCA: spatial-temporal correlation analysis.
[ ]
[ Fig. 3. ] Comparison of experimental results of TCA-LI and DESM. TCA-LI: linear interpolation algorithm of temporal correlation analysis, DESM: data estimation using statistical model.
[ Table 3. ] Performance comparison of TCA-LI and DESM
[ Fig. 4. ] Comparison of experimental results of TCA-MR and DESM. TCA-LI: linear interpolation algorithm of temporal correlation analysis, DESM: data estimation using statistical model.
[ Table 4. ] Performance comparison of TCA-MR and DESM
[ Fig. 5. ] Comparison of experimental results of STCA and DESM. STCA: spatial-temporal correlation analysis, DESM: data estimation using statistical model.
[ Table 5. ] Performance comparison of STCA and DESM
[ Fig. 6. ] Comparison of experimental results of SCA and DESM. SCA: spatial correlation analysis, DESM: data estimation using statistical model.
[ Table 6. ] Performance comparison of SCA and DESM