Uncertainty for Privacy and 2Dimensional Range Query Distortion
 Author: Sioutas Spyros, Magkos Emmanouil, Karydis Ioannis, Verykios Vassilios S.
 Organization: Sioutas Spyros; Magkos Emmanouil; Karydis Ioannis; Verykios Vassilios S.
 Publish: Journal of Computing Science and Engineering Volume 5, Issue3, p210~222, 30 June 2011

ABSTRACT
In this work, we study the problem of privacypreservation data publishing in moving objects databases. In particular, the trajectory of a mobile user in a plane is no longer a polyline in a twodimensional space, instead it is a twodimensional surface of fixed width 2
A_{min} ,whereA_{min} defines the semidiameter of the minimum spatial circular extent that must replace the real location of the mobile user on the XYplane, in the anonymized (kNN) request. The desired anonymity is not achieved and the entire system becomes vulnerable to attackers, since a malicious attacker can observe that during the time, many of the neighbors’ ids change, except for a small number of users. Thus, we reinforce the privacy model by clustering the mobile users according to their motion patterns in (u, θ ) plane, where u andθ define the velocity measure and the motion direction (angle) respectively. In this case, the anonymized (kNN) request looks up neighbors, who belong to the same cluster with the mobile requester in (u, θ ) space: Thus, we know that the trajectory of the kanonymous mobile user is within this surface, but we do not know exactly where. We transform the surface’s boundary polylines to dual points and we focus on the information distortion introduced by this space translation. We develop a set of efficient spatiotemporal access methods and we experimentally measure the impact of information distortion by comparing the performance results of the same spatiotemporal range queries executed on the original database and on the anonymized one.

KEYWORD
Uncertainty , Privacy , Anonymity , Moving objects databases , Voronoi clustering

I. INTRODUCTION
Technological advances in sensors and wireless communications have enabled offering of high location accuracy in location tracking at a low cost [13]. Such accuracy gave rise to a series of locationbased applications that exploit positional data to offer highend locationbased services (LBS) to their subscribers[4]. Conversely, mobile users require the protection of their context information (e.g., location and/or identity information)against privacy adversaries (e.g., Bigbrother type threats,users profiling, unsolicited advertising) [58]. The privacy issue is amplified by the requirement in modern telematics and locationaware applications for realtime, continuous location updates and accurate location information (e.g., traffic monitoring, asset tracking, locationbased advertising, locationbased payments,routing directions) [911].
We consider a population of mobile users who are supported by some telecommunication infrastructure, owned by a telecom operator. Every user periodically transmits through his/her mobile device a location update to some traffic monitoring system residing in a untrusted Internetconnected LBS provider. A trusted server in the telecom operator plays the role of a proxy that mediates between the user and the LBS provider. Our setting involves an LBS provider that is requested to efficiently answer Range Queries (RQs) of mobile users moving on the plane. For example historical queries of the form:
SELECT Count (MobileUserIds)
FROM Moving_Object_Database (MOD)
WHERE spatialarea=MBR(‘Acropolis’) and
t_past <time <t_now or prediction queries of the form:
SELECT Count (MobileUserIds)
FROM Moving_Object_Database (MOD)
WHERE spatialarea=MBR(‘Acropolis’) and
t_now <time <t_ future that provide traffic monitoring services to mobile users. This type of query focuses on the problem of indexing mobile users in two dimensions and efficiently answering range queries over the users’ locations. This problem is motivated by a set of reallife applications, such as intelligent transportation systems, cellular communications, and meteorology monitoring.
Our privacy requirements, include:
? Location privacy:The protocol does not reveal the (exact)user’s location information to the LBS provider.
? Tracking protection (Unlinkability):The LBS provider is not able to link two or more successive user positions.
In our privacy threat model, we assume an attacker has access to the kanonymized (cloaked) location queries and to the algorithms used by the trusted server to support user privacy.We assume that the proxy server of the telecom operator is trusted not to collude with the LBS provider or other internal/external nonauthorized entities to expose the atomic location updates of system users.
Traditionally, the transformation of the exact user location to a spatiotemporal area is achieved by the kanonymity principle for relational data [12, 13]. This requires that each record in a given dataset is indistinguishable from at least
k ? 1 other records with respect to a certain set of identifying variables, collectively known as the “quasiidentifier”. The kanonymity principle requires that the spatiotemporal area that is generated by the trusted server from the exact location of the mobile user is such that the identity of the user cannot be disclosed with a probability that is larger than 1/k , amongk ? 1 other users. The kanonymity primitive is essential to protect the privacy of the users, starting from the point of request for a RQ service and continuing for as long as the requested service withstands completion.> A. High Level Description of Our Framework
As part of our framework, we deliver the type of
k trajectory privacy [14] that identifiesk ? 1 users that are close to the requester at the time of request and thus could have issued the request. This includes a minimum circular spatial area A_{min} around the requester, in which the participants of the anonymity set should be searched, so the user is adequately hidden. The proposed privacy framework relies on a user privacy profile that stores the necessary information related to his/her privacy requirements. This includes 1) the preferred value ofk (in kanonymity)for each requested RQ service, 2) the minimum circular spatial areaA_{min} , around the requester, in which the participants of the anonymity set should be searched, so that the user is adequately hidden. This threshold defines the minimum extent of the spatial area that must replace the real location of the user, in the anonymized request.The proposed framework deals with the event of failure in the provision of
k anonymity, in the case where the number of participants inside this minimum spatial area is less thank  1.In this case, the trusted server postpones the servicing of the user request for a small period. Then, if the anonymization process fails again, the requester is protected by blocking the servicing of the request.The proposed privacy framework can be reinforced by clustering the mobile users according to their motion patterns on the(
u, θ ) plane, whereu andθ define the velocity measure and the motion direction (angle) respectively. In this case, the anonymized(kNN) request lookups neighbors, who belong to the same cluster with the mobile requester in (u, θ ) space.Two basic approaches are used when trying to handle RQ services; those that deal with discrete and those that deal with continuous movements. In a discrete environment the problem of dealing with a set of moving objects can be considered to be equivalent to a sequence of database snapshots of the object positions/extents taken at time instants t_{1} < t_{2} < . . ., with each time instant denoting the moment where a change took place.From this viewpoint, the indexing problems in such environments can be dealt with by suitably extending indexing techniques from the area of temporal [15] or/and spatial databases[16]; in [17] it is elegantly exposed how these indexing techniques can be generalized to handle efficiently queries in a discrete spatiotemporal environment. A plethora of efficient data structures exists that consider continuous movements [1822].The common thrust behind these indexing structures lies in the idea of abstracting each object’s position as a continuous function f(t) of time and updating the database whenever the function parameters change; accordingly, an object is modeled as a pair consisting of its extent at a reference time (design parameter)and its motion vector. One categorization of the aforementioned structures is according to the family of the underlying access method used. In particular, there are approaches based either on Rtrees or on Quadtrees. Conversely, these structures can be also partitioned into 1) those that are based on geometric duality and represent the stored objects in the dual space [19,21] and 2) those that leave the original representation intact by indexing data in their native nd space [20, 22, 23]. The geometric duality transformation is a tool heavily used in the Computational Geometry literature that maps hyperplanes in Rn to points and viceversa. We implement a framework based on the proposed privacy model that uses the Spatial extensions of MySql 5.x to offer privacy in RQ services.
In this work, we study the problem of privacypreservation data publishing in moving objects databases. In particular, the trajectory of a mobile user in the plane is no longer a polyline in a twodimensional space, instead it is a twodimensional surface:we know that the trajectory of the mobile user is within
this surface, but we do not know exactly where. We transform the surface’s boundary polylines to dual points [19, 20] and we focus on the information distortion introduced by this space translation. We develop a set of efficient spatiotemporal access methods and, we experimentally measure the impact of information distortion by comparing the performance results of the same spatiotemporal range queries executed on the original database and on the anonymized one.
The remainder of this paper is organized as follows. Section II lays out the system model for our proposed methodology and gives a formal description of the problem. Section III discusses preliminary notions and results used throughout the paper. Section IV presents the method of transforming the trajectory polylines to twodimensional surfaces. Section V presents: 1) the duality transformation of the surface’s boundary polylines, 2) a simple observation, based on which we enforce the privacy model of our method, and 3) the information distortion introduced by this space translation. Section VI presents an extended experimental evaluation. Section VII concludes the paper.
II. SYSTEM MODEL AND PROBLEM DESCRIPTION
This section lays out the system model in which our proposed methodology is applied. The considered system framework interconnects a set of registered LBS providers to a set of subscribed users, to enable the offering of RQSs in a privacyaware manner. Fig. 1 depicts our system framework that has two main components: the trusted server, which is assigned the task of trajectory privacy preservation, and the service providers,which offer the actual RQSs. As part of its functionality, the trusted server maintains a database that stores the privacy profiles of the users, their history of movement, as well as knowledge of motion patterns.
A system user can be any individual who is equipped with a mobile device and is registered to the trusted server. Upon his/her registration, the user is assigned a unique profile in the system and his/her movement is subsequently monitored by the trusted server. The tracking of user movement is enabled by the transmission of periodic location updates from the mobile device of the user to the trusted server. A privacy profile for a user consists of a set of tuples (sid, K, A_{min}) that define his/her privacy requirements that involve 1) the value of k for each provided service sid, where k indicates that the user wants to remain Kanonymous when using this service, and 2) the minimum area of generalization Amin, where A_{min} indicates the minimum spatial area to which the anonymized request should point.
Fig. 1 depicts the regular scenario for the privacyaware provision of RQSs. Whenever a user requests an RQS (step 1), the trusted server receives the original request (step 2), examines the point of request and determines the appropriate anonymity strategy that needs to be enforced. Next, the original request is transformed to a ’secure’ equivalent that preserves the kanonymity of the requester and is forwarded to the appropriate service provider to be serviced (step 3). After the request has been serviced, the result set, computed by the service provider, is returned to the trusted server (step 4) where it is filtered to contain only the actual answer that is subsequently forwarded to the requester (steps 5 and 6). This concludes the privacyaware provision of the service.
We consider a database that records the position of moving objects in two dimensions on a finite terrain. We assume that objects move with a constant velocity vector starting from a specific location at a specific time instant. Thus, we can calculate the future object position, if its motion characteristics remain the same. Velocities are bounded by [u_{min}, u_{max}]. Objects update their motion information, when their speed or direction changes. The system is dynamic, i.e., objects may be deleted or new objects may be inserted. Let P_{z}(t_{0}) = [x_{0}, y_{0}] be the initial position at time t_{0} of object z. If object z starts moving at time t> t_{0}, its position will be P_{z}(t) = [x(t), y(t)] = [x_{0} + u_{x}(t ? t_{0}), y_{0} +
u_{y}(t ? t_{0})], where U = (u_{x}, u_{y}) is its velocity vector. For example,in Fig. 2 the lines depict the objects trajectories on the (t, y)plane. We would like to answer queries of the form:“Report the objects located inside the rectangle [x_{1 q}, x_{2 q}] ×[y_{1 q}, y_{2 q}] at the time instants between t_{1} q and t_{2} q (where tnow ≤ t_{1} q ≤t_{2 q}), given the current motion information of all objects.”
III. PRELIMINARIES
In the following, we briefly describe the most basic methods for indexing mobile objects, as well as the best current clustering algorithms that will be used throughout the paper and specifically for clustering mobile objects with similar motionpattern.
> A. Literature Survey: The Most Basic Methods for Indexing Mobile Objects
In the sequel, let N denote the input size (number of stored objects), B the block size, K the output size and thus n = N/B and k = K/B are the size of the input and output in blocks,respectively.
In [19] a set of indexing techniques was presented, which used the geometric duality transformation at the cost of increasing by one the dimensionality. Hence, for the onedimensional case they reduced the indexing problem to the twodimensional simplex range searching problem and they employed external memory partition trees to solve their indexing problem in O(n)space, O(n^{1/2} + k) I/Os query time and O(log n) I/Os update time.
Partition trees, with a guaranteed worst case performance,are generally considered impractical, since they entail large hidden factors. Thus, in [19] two more structures were presented,one based on kdtrees and a more complex one based on B+trees; both structures used linear space and usually work well.Moreover, they extended their results in the twodimensional case for two distinct versions of the problem; first, the objects were allowed to move on a network of onedimensional routes,and, second, the objects were allowed to move arbitrary. The first version reduced to a number of onedimensional subproblems that used the previously described structures, whereas the second is equivalent (through geometric duality) to simplex range queries in ?^{3}, which can be solved in O(n^{2/3} + k) I/Os with the use of external memory partition trees.
In [24] the above results were further refined. A new version of partition tree was introduced to handle the indexing problem in the plane in O(n) space, O(n^{1/2} + k) query time, and O(log_{B} n)expected amortized update time; the results could apply in higherdimensional spaces as well, degrading only the update time (it became O( log_{B}^{2} n) I/Os). If it is assumed that the queries arrive in chronological order, then the query time can be further reduced to O( log_{B}^{2} n/log_{B} log_{B} n) I/Os; this is achieved by employing the kinetic data structures framework at external range trees.Moreover, they developed an indexing scheme with small query time for nearfuture queries and increased time for distant in the future queries under the bound of O(n^{1/2+c}+ k) I/Os by combining multiversion kinetic data structures with partition trees.Finally, an indexing technique was described to handle O(δ)approximate queries; the query time was (n^{1/2+c} /δ), the expected update time O( ^{1/2+c} n/δ) and the space O(n/δ) disk blocks.
The TPR tree [25] in essence is an R*tree generalization to store and access linearly moving objects. The leaves of the structure store pairs with the position of the moving point and the moving point ID, whereas internal nodes store pointers to subtrees with associated rectangles that minimally bound all moving points or other rectangles in the subtree. The difference to the classical R*tree lies in the fact that the bounding rectangles are timeparameterized (their coordinates are functions of time). It is considered that a timeparameterized rectangle bounds all enclosed points or rectangles at all times later than the current time. The algorithms for search and update operations in the TPR tree are straightforward generalizations of the respective algorithms in the R*tree. Moreover, the various kinds of spatiotemporal queries can be handled uniformly in 1,2, and 3dimensional spaces.
The TPRtree constituted the base structure for further developments in the area [26]. An extension to the TPRtree was proposed in [22], the socalled TPR*tree. The main improvement lies in the update operations, where it is shown that local optimization criteria (at each tree node) may seriously degrade the performance of the structure and more particularly in the use of update rules that are based on global optimization criteria. They proposed a novel probabilistic cost model to validate the performance of a spatiotemporal index and analyzed the optimal performance for any datapartition index with this model.
The STRIPES index [21] is based on the application of the duality transformation and employs disjoint regular space partitions(disk based quadtrees [16]); the authors claim, through the use of a series of implementations, that STRIPES outperforms TPR*trees for both update and query operations.
Finally, in [27] a new access method (LBTs) was presented for indexing mobile objects that move on the plane to efficiently answer range queries about their future location. It has been proved that its update performance is the most efficient in all cases. The superiority of LBTs method regarding query performance has been shown as far as the query rectangle length remains at realistic levels (greatly outperforming other methods).If the query rectangle length becomes extremely large in relation to the whole terrain, then STRIPES is better than any other solution, however, only to a small margin in comparison to LBT’s method.
> B. Literature Survey: The Most Basic Clustering Algorithms
Clustering is the method of grouping the data into sets, so data points within the set should have high similarity, while those within different sets are dissimilar [28, 29].
1) KMeans Algorithm: Kmeans clustering [30] is a simple and widely used clustering algorithm. The KMeans algorithm is based on the squared error minimization method. We discuss the KMeans algorithm as follows:
The time complexity of KMeans algorithm is O(nkt), where n is the number of data objects, k the number of clusters and t the number of iterations.
The obvious shortcomings of the basic KMeans clustering are that the number of clusters needs to be determined in
advance and its computational cost with respect to the number of observations, clusters, and iterations. Although, KMeans is efficient in the sense that only the distance between a point and the k cluster centers ?not all other points? needs to be (re)computed in each iteration. Typically, the number of observations n is much larger than k.
Many approaches have tried to alleviate both these problems.Variations of KMeans clustering that are supposed to cope without prior knowledge of the number of cluster centers have been presented [31, 32]. While the proposed solutions to KMeans clustering certainly improve its practical behavior, they do not overcome the fundamental problems associated with the algorithm.
In [33] densitybased algorithms have been proposed as a better alternative to identifying the correct number of clusters in the data. These algorithms make use of Voronoi diagrams (as in[34] and [35]) and are statistically significantly more accurate than KMeans clustering.
2) Voronoi Diagram:The Voronoi diagram is one of the most important and useful techniques in the field of computational geometry. Given a set D of n data points o1, o2, o3,. . . on in a plane, the Voronoi diagram (V or (D)) is a subdivision of the plane into Voronoi cells (V (oi) for oi) (Fig. 3). The Voronoi cells are the set of points u that are closer to oi than any other point in D. i.e., V(oi) = {u'd(oi, u) ≤ d(oj, u) j ≠ i}, where d is the Euclidian
distance. The Voronoi diagram divides the plane into n convex polygon regions (for each oi), the vertices (P of Fig. 3) of the diagram are the Voronoi vertices, and the borders between two adjacent Voronoi cells are termed Voronoi edges (E of Fig. 3).Note that each Voronoi vertex is the center of a circle touching three or more data points lying in its adjacent Voronoi cells (Fig. 3).
3) Voronoi Clustering:The algorithm presented in [33],begins by constructing the Voronoi diagram for S, where S = s1,. . . , sn ⊆ ?d is a given data set of n points that is going to be clustered. The computational complexity of Voronoi diagram construction in the general case is O(n log n) [36], but requires only linear time in some restricted cases [37]. The algorithm is given as an input parameter a threshold value max indicating the maximum volume allowed to a cell that still can be combined into an evolving cluster.
Each point makes up its own cell in the Voronoi diagram for the data. The volumes (areas in case of the plane) of the cells are computed (or approximated) next. This approximation is not very accurate in small dimensions, but it improves, as the number of dimensions grows. Each cell is associated with a class label. The algorithm assigns increasing integer values as class labels. The point with the smallest Voronoi cell is handled first.
The cell boundaries in the Voronoi diagram of a point set are by definition equally distant from the relevant points. Thus, the Voronoi diagram naturally gives rise to a kind of maximum margin separation of the data. The algorithm combines existing cells, instead of creating artificial center points as, e.g., KMeans does. Hence, the maximal margins are maintained also in the clustered data. Maximal margin clustering seems a natural choice, since there are no reasons to place the cluster border closer to one or other cluster.
One can view Voronoi clustering as a change of paradigm from the KMeans, whose Euclidean distance minimization can be made autonomous with respect to the number of clusters only by optimizing a parameter that is too hard to learn. Kao et al. [34] proposed a similar pruning technique based on Voronoi diagrams to reduce the number of expected distance calculations in case the mobile objects draw uncertain locations.
IV. TRAJECTORY POLYLINES AND TWODIMENSIONAL SURFACES
For every mobile user, we calculate a circular range query whose center is its current 2D location and radius a given value R defined by a minimum circular spatial area
A_{min} . If this circular spatial area includes at leastk ? 1 other neighbors, then the mobile user is adequately hidden. Otherwise, if the number of participants inside this minimum spatial area is less thank  1,the trusted server postpones the servicing of the user request for a small period. Next, if the anonymization process fails again,the requester is protected by blocking the servicing of the request. Thus, consecutive circular spatial areas construct a 2D buffer defined by its upper and lower boundary polylinesy' andy'' respectively that anonymize the original trajectory y of mobile userA (Fig. 4). We can create each mobile user as 2D point using the Spatial extensions of MySql 5.x, as follows:CREATE TABLE Points (
name VARCHAR(20) PRIMARY KEY,
location Point NOT NULL,
description VARCHAR(200),
SPATIAL INDEX(location)
);
To obtain points in a circular area as a counted result ordered by distance from the center of the selection area, we write:
SELECT COUNT(name), AsText(location), SQRT(POW
(ABS(X(location)  X(@center)), 2) + POW(ABS(Y(location)
 Y(@center)), 2)) AS distance
FROM Points
WHERE Intersects(location, GeomFromText(@bbox))
AND SQRT(POW(ABS(X(location)  X(@center)), 2) + POW
(ABS(Y(location)  Y(@center)), 2)) ≤ @R
ORDER BY distance;
If the result returned is less than
k 1, the trusted server postpones the servicing of the user request for a small period.Thus, the RQ service depicted in Fig. 2, was transformed to the PrivacyAware RQ service depicted in Fig. 5:
If the entire buffer lies inside the query area (the buffers of mobile users O_{3} or O_{4} in Fig. 5), meaning that both upper and lower boundary polylines lie in the query rectangle, then the same holds for the original trajectory. If the entire buffer lies outside the query area (the buffer of mobile user O_{2} in Fig. 5),meaning that both upper and lower boundary polylines lie outside the query rectangle, then the same holds for the original trajectory. In the worstcase, we face the distortion effect, where one of the two boundary polylines lies in the query rectangle(the buffer of mobile user O_{1} in Fig. 5). In the later case, TS(Trusted Server) has to check what happens to the original trajectory.
V. DUALITY TRANSFORMATION OF BOUNDARYTRAJECTORIES AND INFORMATION DISTORTION
The duality transform, in general, maps a hyperplane
h from R^{n} to a point in R^{n} and viceversa. In this subsection, we briefly describe how we can address the problem at hand in more intuitively,using the duality transform in the 1d case.> A. HoughX Transform
One duality transform for mapping the line with equation
y(t)= ut + a to a point in R^{2} is using the dual plane, where one axis represents the slope u of an objects trajectory (i.e., velocity),whereas the other axis represents its intercept a. Thus, we get the dual point (u, a ) (this is the so called HoughX transform[19, 20]). Accordingly, the 1d query [(t_{1q}, t_{2q} ), (y_{1q}, y_{2} )]becomes a polygon in the dual space.Using a linear constraint query [38], the query in the dual HoughX plane (Fig. 6) is expressed as:
If u > 0, then Q_{HoughX} = A_{1}？A_{2}？A_{3}？A_{4}, where A_{i} is defined
as follows:
A_{1} = u ≥ u_{min},
A_{2} = u ≤ u_{max},
A_{3} = a ≥ y_{1q} ? t_{2q}u,
A_{4} = a ≤ y_{2q} ? t_{1q}u.
Inequalities of the A_{1} and A_{2} areas are obvious. The inequalities for A_{3} and A_{4} can be derived as follows: ∀t∈ [t_{1q}, t_{2q}] ⇒ y_{1q} ≤y ≤ y_{2q} ⇒ y1q ? t_{2q}u ≤ y_{1q} ? tu ≤ a ≤ y_{2q} ? tu ≤ y_{2q} ? t_{1q}u, since t_{1q} ≤t ≤ t_{2q}.
If
u < 0, thenQ_{HoughX} = B_{1}？B_{2}？B_{3}？B_{4}, where Bi is defined as follows:B_{1} = u ≤ ? u_{min},
B_{2} = u ≥ ? u_{max},
B_{3} = a ≥ y_{1q} ? t_{1q}u,
B_{4} = a ≤ y_{2q} ? t_{2q}u.
Inequalities of the B_{1} and B_{2} areas are obvious. For B_{3} and B_{4}, we are working in the same way as in the case of A_{3} and A_{4}.
Inequalities of the B_{1} and B_{2} areas are obvious. For B_{3} and B_{4}, we are working in the same way as in the case of A_{3} and A_{4}.
∀t∈[t_{1q}, t_{2q}] ⇒ y_{1q} ≤ y ≤ y_{2q} ⇒ y_{1q} ？ t_{1q}u ≤ y_{1q} ？ tu ≤ a ≤ y_{2q} ？ tu ≤ y_{2q} ？ t_{2q}u, since 0 ≤ t_{1q} ≤ t ≤ t_{2q} and u < 0.
In Fig. 6 the line a = y_{1q} ？ t_{1q}u for u = u_{max} becomes a = y_{1q} ？ t_{1q}u_{max} and the line a = y_{2q} ？ t_{2q}u for u = u_{min} becomes a = y_{2q} ？ t_{2q}u_{min}. Thus, for the Service Provider (SP), the initial query [(t_{1q}, t_{2q}), (y_{1q}, y_{2q})] in the (t, y) plane is transformed to the rectangular query [(u_{min}, u_{max}), (y_{1q} ？ t_{1q}u_{max}, y_{2q} ？ t_{2q}u_{min})] in the (u, a) plane.
Similarly, for the upper (y'(t) = ut + a + R) and lower (y''(t) = ut + a ？ R) boundary trajectories, SP gets the dual points (u, a + R) and (u, a ？ R) as well, as the final (transformed) rectangular queries become [(u_{min}, u_{max}), (y_{1q} ？ R ？ t_{1q}u_{max}, y_{2q} ？ R ？ t_{2q}u_{min})] and [(u_{min}, u_{max}), (y_{1q} + R ？ t_{1q}u_{max}, y_{2q} + R ？ t_{2q}u_{min})] respectively in the (u, a) plane.
> B. HoughY Transform
By rewriting the equation
y = ut + a ast = 1/u y ? a/u, we can arrive at a different dual representation (the so called HoughY transform in [19, 20]. The point in the dual plane has coordinates(b, n ), whereb = ? a/u andn = 1/u. Coordinate b is the point where the line intersects the liney = 0 in the original space.Horizontal lines cannot be represented using this transform.Similarly, the HoughX transform cannot represent vertical lines. Nevertheless, both transforms are valid, since in our setting lines have a minimum and maximum slope (velocity is bounded by [u_{min}, u_{max} ]).The query in the dual HoughY plane (Fig. 7) is expressed as follows. If u > 0, then
Q_{HoughY} = C_{1}？C_{2}？C_{3}？C_{4}, whereInequalities of the C_{1} and C_{2} areas are obvious. The inequalities for C_{3} and C_{4} can be derived as follows: ∀t ∈ [t_{1q}, t_{2q}] ⇒ n =
Hence, the two lines in Fig. 7 have negative slope and for b = 0 intersect the axis b in t_{1q} and t_{2q}, respectively. The intersection of the four regions C_{1}, C_{2}, C_{3}, and C_{4} form the shaded polygon query of Fig. 7.
If
u < 0, thenQ_{HoughY} = D_{1} ？ D_{2} ？ D_{3} ？ D_{4}, whereThe line n =  1/y b + t_{1q}/y for = n1/u_{min} iamplies that
In the same way, the line n =  1/y b + t_{2q}/y for = n1/u_{min} iamplies that
However, according to the initial query and equation (1), we have
Analogously, according to the initial query and equation (2),we have
According to (3) and (4) the initial query [(t_{1q}, t_{2q}), (y_{1q}, y_{2q})] in the (
t, y ) plane can be transformed to the following rectangular query in the (b, n ) plane:Similarly,for the upper (
y'(t) = ut + a + R ) and lower (y''(t) = ut + a? R ) boundary trajectories, SP gets the transformed rectangular queriesand
respectively in the (
b, n ) plane.> C. The Proposed Algorithm for PrivacyAware Indexing
Let S = {y_{1}, y_{2}, . . . , y_{n}} be the initial set of original trajectory equations, and S' = { y'_{1}, y''_{1}, y'_{2}, y''_{2} . . . , y'_{n}, y''_{n} } the set of boundary trajectory equations defined by the buffer.
Then, let H_{x}S = {(u_{1}, a_{1} + R), (u_{1}, a_{1} ? R), . . . , (u_{n}, a_{i} + R),(u_{n}, a_{i} ? R)} and
H_{y}S = { b^{'}_{1}, b^{''}_{1}, . . . , b^{'}_{n}, b^{''}_{n} } be the set of dual HoughX and HoughY transforms respectively.Algorithm 3 runs in SP and depicts the procedure to build the index. Algorithm 4 also runs in SP and presents the procedure to Partition the Mobile Users according to their velocity. Algorithm 5 is running in SP and finally calls Algorithm 6, which is running in Trusted Server (TS) to perform the final filtering step. In particular, Algorithms 5 and 6 outline the privacyaware algorithm to answer the exact 2d.
RQ query:
In [19, 20],
Q_{Hough?X} is computed by querying a 2d partition tree, whereasQ_{Hough?Y} is computed by querying a B^{+}  tree that indexes theb parameters. Here, we consider the case where the users are moving with nonsmall velocitiesu , meaning that the velocity value distribution is skewed (Zipf) towards u_{min} in some range [u_{min}, u_{max} ] and consequently theQ_{Hough?Y} transformation is used (denote thatu_{min} is a positive lower threshold).Moreover, our method will incorporate the Lazy Btree [39]indexing scheme (Source code of Lazy Btree index is free available at the following URL: http://www.ionio.gr/sioutas/NewSoftware.htm), since the latter can handle update queries in optimal (constant) number of blocktransfers (I/Os). Thus,we get Algorithm 7 and 8. Algorithm 7 is running in SP machine and it calls Algorithm 8, which runs in TS machine, because the latter has the appropriate knowledge of the original trajectory parameterb_{i} to execute the final filtering step.> D. Reinforcing the Privacy Model of Our Method Using VoronoiClustering of Mobile Users with a Similar MotionPattern
In statements 9 and 4 of algorithm 7 and 8 respectively, we call the routine
neighbors (idofuser_{i} ,k ? 1) routine, to find thek ? 1 nearest neighbors that will anonymize the original mobile user i. Let a malicious attacker who observes that during the time, many of the neighbors’ ids change, except for a small number of users. In this case, the desired anonymity is not achieved and the entire system becomes vulnerable to attackers.Thus and in order to reinforce the entire privacy model, we initially call theVoronoi ? Clustering routine to cluster the mobile users according to their motion pattern (Fig. 8) and then we call the neighbors(idofuser_{i} ,k ? 1) routine that chooses neighbors who belong to the same cluster as user i. In this case, the neighbors’ids do not change with high frequency and the entire privacy model becomes robust.We define as the motionpattern of mobile user i at time instant ti the point (
u_{i}, θ_{i} ) that represents both velocity value u_{i} and motion direction θ_{i}. Note here that we cannot avoid the 2D clustering method mentioned before, by clustering of 1D elements,like the parametersb_{i} or the projections of velocity vector(u_{ix} = u_{i}cosθ ) at X or (u_{ix} = u_{i}sinθ ) Yaxis. Although these parameters include information about motion direction, these are not at all sufficient. Just think of the case where mobile objects are moving in opposite directions, but also belong to the same cluster, since f.e. the cosine of opposite angles is the same.Thus, we get Algorithm 9 and 10. Similarly, Algorithm 9 is running in SP machine as well as Algorithm 10 executes the final filtering step in TS machine.1) Distortion and Competitive Ratio: Let K be the number of bi parameters associated with boundary trajectories of buffers that intersect with the query rectangle. Then, algorithms 9 and 10 require totally T(n) = O(Cost(LazyB_tree) + K) I/Os or block transfers. Moreover, and according to notations presented in[40], let us say that D is the initial Database that stores the N original trajectories and D' the PrivacyAware Database that stores the 2N boundarytrajectories. Let us also say, Q(D) and Q(D') are the Query Results obtained consuming T(D) and T(D') blocktransfers (I/Os) in D and D' database schemes,respectively. We define the Distortion ratio to be the Distortion _Ratio =
and the
Competitive ratio to be theCompetitive_Ratio =In all cases,
T(D') > . . .>T(D) , thus it is very important to determine, how competitive to the optimal one(T(D)) is the privacyaware method that answers the query inD' . An experimental evaluation ofCompetitive_Ratio vs.K is also presented in the following section,since the distortion effect inD' absolutely depends on parameterK .VI. EXPERIMENTAL EVALUATION
This section compares the query performance of our privacyaware method, when incorporating STRIPES [21] (the best known solution), Lazy Btrees (LBTs) and TPR*tree, respectively.We deploy spatiotemporal data that contain insertions at a single timestamp 0. In particular, objects’ MBRs are taken from the LA spatial dataset (128971 MBRs) (Downloaded from the Tiger website http://www.census.gov/geo/www/tiger/). We want to simulate a situation in which all objects move in a space with dimensions 100 × 100 km. Thus, each axis of the space is normalized to [0,100000]. For the TPR*tree, each object is associated with a Velocity Bounded Rectangle (VBR), such that 1) the object does not change spatial extents during its movement,2) the velocity value distribution is skewed (Zipf)towards 30 in range [30,50], and 3) the velocity can be either positive or negative with equal probability. As in [23], we will use a small page size, so the number of index nodes simulates realistic situations.
Thus, for all experiments, the page size is 1 kbyte, the key length is 8 bytes, whereas the pointer length is 4 bytes. Thus,the maximum number of entries (< x > or < y >, respectively) in Lazy Btree is 1,024 / (8 + 4) = 85. Similarly, the maximum number of entries (2D rectangles or < x1, y1, x2, y2 > tuples) in TPR* tree is 1,024 / (4*8 + 4) = 27. Conversely, the STRIPES index maps predicted positions to points in a dual transformed space and indexes this space using a disjoint regular partitioning of space. Each of the two dual planes, are equally partitioned into four quads. This partitioning results in 4^{2} = 16 partitions that we term grids. Thus, the fanout of each nonleaf node is 16. All indexes except STRIPES have similar sizes for each dataset.Specifically, for LA, each tree has 4 levels and around 6700 leaves apart from the STRIPES index that has a maximum height of seven and consumes about 2.4 times larger disk space.Each query
q has three parameters: q_{R}len, q_{v}len, and q_{T}len, such that 1) its MBR q_{R} is a square, with length q_{R}len, uniformly generated in the data space, 2) its VBR isqV =?q_{v}len/2, q_{v}len/2,?q_{v}len/2, q_{v}len/2, and 3) its query interval isq_{T} = [0,q_{T}len ]. The query cost is measured as the average number of node accesses in executing a workload of 200 queries with the same parameters.Implementations were performed in C++ including particular libraries from SECONDARY LEDA.> A. Query Cost Comparison
We measured the Competitive Ratio of LBTs method (this method incorporates two Lazy Btrees that index the appropriate
b parameters in each projection respectively, and finally combines the two answers by detecting and filtering all the pair permutations), the TPR*tree and STRIPES presented in [21]and [22] respectively, using the same query workload, every 10,000 updates.
Figs. 9 to 13 show the Competitive Ratio as a function of
K (for datasets generated from LA as described above), using workloads with different parameters. Parameter K represents boundary trajectories of buffers that intersect with the query rectangle, and obviously requires an additional filtering on the fly process. Obviously, the required number of block transfersdepends on the answer’s size, as well as the size of
K .Fig. 12 depicts the performance of all methods where the time interval length degrades to 1. Even in this case, the LBTs method is more competitive than STRIPES and TPR*tree. As the query rectangle length grows from 400 to 1,000, the LBTs method advantage decreases; we remark that STRIPES becomes faster, whereas LBTs method has exactly the same performance as the TPR*trees.
Fig. 13 depicts the efficiency of LBTs solution in comparison to that of TPR*trees and STRIPES, respectively, in the case where the time interval length increases to 100.
VII. CONCLUSIONS
We presented the problem of anonymity preservation data publishing in moving objects databases. In particular, we studied the case where the trajectory of a mobile user on the plane is no longer a polyline in a twodimensional space, instead it is a twodimensional surface, which covers the real location of the mobile user on the
XY plane, in the anonymized (kNN) request.We reinforced the privacy model by clustering the mobile users according to their motion patterns in (u, θ ) plane, whereu andθ define the velocity measure and the motion direction (angle),respectively.In this case, the anonymized (kNN) request looks up neighbors,who belong to the same cluster with the mobile requester in (
u, θ ) space. By transforming the surface’s boundary polylines to dualpoints, we experimentally focused on the impact of information distortion introduced by this space translation.

[Fig. 1.] The system architecture.

[Fig. 2.] Trajectories and query in (t y) plane.

[301] Algorithm 1. KMeans

[Fig. 3.] Voronoi diagram.

[302] Algorithm 2. Voronoi clustering (set S real max)

[Fig. 4.] 2D buffer and boundary trajectories y' and y'' of mobile user A.

[Fig. 5.] Boundary trajectories and query in (t y) plane.

[Fig. 6.] Query in the HoughX dual plane.

[Fig. 7.] Query on the HoughY dual plane.

[Fig. 8.] Voronoi clustering and neighbours of the same cluster.

[Fig. 9.] qVlen = 5 qTlen = 50 qRlen = 100 (top) qRlen = 1000 (bottom)Rmax = 50 (top) Rmax = 200 (bottom).

[Fig. 10.] qRlen = 2000 qVlen = 5 qTlen = 50 Rmax = 500.

[Fig. 11.] qvlen = 10 qTlen = 50 qRlen = 400 (top) qRlen = 1000 (bottom)Rmax = 100 (top) Rmax = 200 (bottom).

[Fig. 12.] qvlen = 5 qTlen = 1 qRlen = 400 (top) qRlen = 1000 (bottom) Rmax= 100 (top) Rmax = 200 (bottom).

[Fig. 13.] qRlen = 400 qvlen = 5 qTlen = 100 Rmax = 100.