Differential Privacy in Practice

  • cc icon
  • ABSTRACT

    We briefly review the problem of statistical disclosure control under differential privacy model, which entails a formal and ad omnia privacy guarantee separating the utility of the database and the risk due to individual participation. It has born fruitful results over the past ten years, both in theoretical connections to other fields and in practical applications to real-life datasets. Promises of differential privacy help to relieve concerns of privacy loss, which hinder the release of community-valuable data. This paper covers main ideas behind differential privacy, its interactive versus non-interactive settings, perturbation mechanisms, and typical applications found in recent research.


  • KEYWORD

    Differential privacy , Protection mechanisms , Attacks

  • I. INTRODUCTION

    Differential privacy has become a de facto principle for privacy-preserving data analysis tasks, and has had many successful applications in spite of the fact that it has just been devised less than ten years ago. However, its rapid expansion requires a systematic approach to grasp fundamental concepts from different viewpoints, as well as to clarify its conceivable limits. The ultimate goal of this paper is to highlight differential privacy ideas by presenting the motivation behind them, along with popular mechanisms and applications. Helpful advice is also given for those who want to go deeper and obtain further benefits from this emerging paradigm.

      >  A. Motivation of Differential Privacy

    Digital information is collected daily in growing volume by governments, corporations, and individuals. Mutual benefits drive the demand for the exchange and publication of data among parties. However, how to handle this data properly is unclear, because the data in its original form typically contains sensitive information about participants. Syntactic processing paradigms like the suppression of identifying fields were eventually proven unsuccessful, as confirmed in past privacy breaches, simply by joining a de-identified table with publicly available databases.

    Ad hoc anonymization models like k-anonymity [1], l-diversity [2], and t-closeness [3] formalize the intuition of “privacy by blending yourself into a crowd” to achieve stronger guarantees. In k-anonymity, quasi-identifier fields are suppressed or generalized, so that each record is indistinguishable from at least k-1 other records. Record linkage attacks are avoided, but not attribute linkage attacks, which are prevented in l-diversity. The concept of earth mover’s distance is employed by t-closeness to deal with probabilistic attacks, focusing on how the attacker would change his probabilistic trust in the sensitive information of a victim after accessing the published data. Common limitations of these approaches include ad hoc assumptions on auxiliary information, heavy information loss, and suboptimality. An excellent survey on different privacy-preserving data publishing (PPDP) was presented by Fung et al. [4].

    To respond to the need of a firm foundation for PPDP, the concept of differential privacy was proposed by Dwork [5]. The concept stemmed from the impossibility of Dalenius’s desideratum on statistical database privacy: nothing about an individual should be learnable from the database that cannot be learned without access to the database. A new measure is therefore suggested by considering the separation of the database utility from the risk due to the victim’s participation.

    Two settings in PPDP are interactive and non-interactive. In the interactive setting, a curator sitting between the users and the database may modify the responses to queries posed by users to satisfy respondents’ privacy requirements. Some difficulties occur in this model, such as how to answer large query sets with regard to even naive difference attacks. The non-interactive setting addresses these problems by releasing some statistics (sanitization) once, and the data are not used further. The quality of answers is affected, and the range of data processing operations is reduced, among other limitations of this “one-shot” scheme.

      >  B. Differential Privacy Concepts

    Roughly speaking, differential privacy ensures that the outcome of any analysis on a database is not influenced substantially by the existence of any individual. It is therefore difficult for an adversary to make inference attacks on any data rows. The idea revolves around the popular indistinguishability concept in semantic security. Privacy concerns about the removal or addition of a single row suggest the guarantee formulation on a pair of adjacent databases (D, D') differing by only one row.

    Definition 1. (ε-differential privacy [5]) A randomized function K gives ε-differential privacy if for all data sets D and D' differing by at most one element, and all S ⊆ Range(K),

    image

    where the probability space in each case is over the coin flips of K.

    The multiplicative eε simplifies the composition rules discussed in later sections. It implies that zero-probable output on a given database is also zero-probable on any neighboring database, ruling out the subsample-and-release paradigm [5]. Group privacy is automatically satisfied when a collection of k users opt in or out if we extend the ratio in Definition 1 by a factor of e. The definition below relaxes the strict relative shift for events that are not especially likely.

    Definition 2. ((ε, δ)-differential privacy [6]) A randomized function K gives (ε, δ)-differential privacy if for all data sets D and D' differing by at most one element, and all S ⊆ Range(K),

    image

    where the probability space in each case is over the coin flips of K.

    The additive factor δ denotes the failure tolerance of the constraint

    image

    We can achieve ε-differential privacy by determining the noise magnitude to add to the output. This noise depends on the sensitivity of the function K, a fundamental quantity in this privacy paradigm.

    Definition 3. (Global sensitivity [7]) For f : D →d, the Lp-sensitivity of f is

    image

    for all D, D' differing in at most one element.

    As an example, the count function over a set S, f(S) = |S| has L1-sensitivity 1.

    For many functions, such as the median, global sensitivity yields large noise, and hence destroys the utility. Nissim et al. [8] proposed the concept of local sensitivity, taking into account not only the function but also the database. The goal is to add instance-specific noise with smaller magnitude than the worst-case noise by global sensitivity.

    Definition 4. (Local sensitivity [8]) For f : D → ?d and x ∈ D, the Lp-local sensitivity of f at x is

    image

    However, directly adding noise proportional to LSf(x) might reveal information about x itself. So, the noise has to be an insensitive function. One study defined a β-smooth upper bound on LSf(x) [8], and we calibrate the noise according to these smooth upper bounds.

    A similar idea can be found in several works where the data-specific ε-DP mechanisms were devised to avoid the naive but inefficient noise injection approaches [9,10].

    II. MECHANISMS

    This section surveys two widely used mechanisms, the Laplace mechanism [7] and the Exponential mechanism [11]. In addition, other variants are also introduced.

      >  A. Laplace and Gaussian Mechanisms

    For the case of real output, adding properly calibrated Laplace noise to the output is a standard technique to realize differential privacy. The noise is sampled from a Laplace distribution with probability density function (pdf)

    image

    where λ is determined by both Δf and the desired privacy budget ε.

    Theorem 1. (Laplace mechanism [12]) For any function f : D →d, the mechanism

    image

    gives ε-differential privacy if λ = Δf /ε and Li(λ) are i.i.d. Laplace random variables.

    Notice that Laplace noise has zero mean and variance 2λ2. So, larger values of ε mean lower noise variance, and the noisy outputs are then closer to the true ones, resulting in better utility and lower privacy level.

    To achieve (ε, δ)-differential privacy, we can use Gaussian noise L1-sensitivity being replaced by L2-sensitivity Δf = maxD,D'||f(D) ? f(D′)||2. Zero-mean Gaussian noise has the pdf

    image

    Theorem 2. (Gaussian mechanism [6]) For any function f : D →d the mechanism

    image

    gives (ε, δ)-differential privacy if

    image

    and Ni(0, σ2) are i.i.d. Gaussian random variables.

      >  B. Exponential Mechanism

    The Laplace mechanism makes no sense in some tasks, such as when the function f maps databases to discrete structures like strings, trees, or strategies. McSherry and Talwar [11] proposed a mechanism that can choose an output tT close to the optimum while satisfying ε-differential privacy. In a nutshell, the exponential mechanism takes an input D, output range T, privacy parameter ε, and a score function u : (D × T) → ? that assigns a real value to every t, where a higher score means better utility. The sensitivity of the score function is defined as Δu = maxt,D,D'|u(D, t) ? u(D', t)|.

    Theorem 3. (Exponential mechanism [11]) For any function u : (D × T) → ?, the mechanism

    image

    satisfies ε-differential privacy.

    The 2ε-DP is the worst-case privacy of exponential mechanisms as we incur at most one ε-DP in the numerator and at least one ε-DP in the denominator of

    image

    However, 2ε-DP is reduced to ε-DP if the denominator is a constant, as in the cases of Laplace and Gaussian mechanisms. These mechanisms are considered special cases of an exponential mechanism by taking u(D, t) = ?|f(D) ? t| (for Laplace) and u(D, t) = ?|f(D) ? t| 2 (for Gaussian), where t is the output. The exponential mechanism can capture any differential privacy mechanism K by taking u(D, r) to be the logarithm of the probability density of K(D) at r. While an exponential mechanism takes time linear to the number of possible results to add noise, Laplace and Gaussian mechanisms are much more efficient because of the O(1) complexity of Laplace/Gaussian noises. The exponential mechanism is used extensively in ε-DP research [10,13-19].

      >  C. Other Mechanisms

    The geometric mechanism [20] is a discrete variant of the Laplace mechanism with integral output range ? and random noise Δ generated from a two-sided geometric distribution

    image

    Among all ε-differentially private mechanisms, this discretized analogue strongly maximizes user utility [20].

      >  D. Composability

    Issues of composition play a key role in any approach to privacy: the sanitization mechanism should preserve privacy guarantees even when several outputs are taken together. Ganta et al. [21] exhibit the composition attacks on independent k-anonymizations of intersecting data sets. Differential privacy satisfies both sequential composition and parallel composition [22].

    Theorem 4. (Sequential composition [22]) Let Ai provide εi-differential privacy. A sequence of Ai(D) over the dataset D provides (Σiεi)-differential privacy.

    Theorem 5. (Parallel composition [22]) Let Ai provide εi-differential privacy. Let Di be arbitrary disjoint subsets of the input domain D. The sequence of Ai(Di) provides (max(εi))-differential privacy.

    As examples, both compositions are usually used in building partitioning trees for data summary under differential privacy [9,17,23,24].

      >  E. Utility Metrics

    Popular utility metrics include (α, β)-usefulness [14], relative error with a sanity bound [25], absolute error [26], KL-divergence [19], and relative entropy [27]. The choice of proper metrics highly depends on the query types.

    Definition 5. ((α, β)-usefulness [14]) A database mechanism A is (α, β)-useful for queries in class C if with probability 1 ? β, for every Q ∈ C and every database D, for

    image

    III. INTERACTIVE SETTING

    For the interactive setting, queries are submitted to the curator in an interactive (and adaptive) manner. It remains unclear whether large numbers of queries could be answered accurately while preserving privacy. The first ideas were studied in the early days of differential privacy with pioneering work by Dinur and Nissim [28]. They showed that in order to achieve privacy against a polynomially bounded adversary, one has to add perturbation of magnitude

    image

    Large numbers of queries in interactive setting were revisited in recent studies [29,30].

      >  A. Methods

    1) Median Mechanism

    The first privacy mechanism capable of answering exponentially more queries than previous interactive mechanisms based on independent Laplace noise is the median mechanism by Roth and Roughgarden [29]. They give a basic (but inefficient) implementation and an efficient alternative with a time complexity polynomial in the number of queries k, a database size n, and the domain size |X|. The key challenge is to determine the appropriate correlations between different query outputs on the fly without knowledge of future queries. The main idea behind the median mechanism is how to classify queries as “hard” and “easy” with low privacy cost. The number of “hard” queries is bounded to O(log k.log |X|) due to a Vapnik-Chervonenkis (VC) dimension argument [31] and the constant-factored shrinkage of the number of consistent databases every time we answer a “hard” query. A collection of databases consistent with the mechanism’s answers is maintained, and a query deemed “easy” would be answered by the median of the query over the database collection. The query error scales as (1/n1/3).poly-log(k).

    2) Multiplicative Weights Mechanism

    Some open questions in the median mechanism are solved in the so-called private multiplicative weights (PMW) mechanism by Hardt and Rothblum [30]. The main result is achieving a running time only linear in N (for each of the k queries), nearly tight with previous cryptographic hardness results [32], while the error scales roughly as

    image

    log k. Moreover, the proposed mechanism makes partial progress for side-stepping previous negative results [32] by relaxing the utility requirement. Specifically, Hardt and Rothblum [30] considered accuracy guarantees for the class of pseudo-smooth databases with sublinear (or even polylogarithmic) running time. The main idea of PMW is to use a privacy-preserving multiplicative weights mechanism. Let the real “fractional” database be x. In each round t with query ft, an updated database xt is maintained, and we compare the noisy answer with the answer given by the previous round’s database ft(xt-1) to assess this round as “lazy” or “update” depending on whether the difference is “close” or “far”. The “lazy” round simply outputs ft(xt-1) and sets xtxt-1, while the “update” round needs to improve xt using multiplicative weights re-weighting. If the total number of update rounds exceeds n, then the mechanism fails and terminates (this is a low-probability event).

    IV. NON-INTERACTIVE SETTING

    Differential privacy in a non-interactive setting is closely related to learning problems in a privacy-preserving manner. Simple statistical quantities like counting and summing can be used as building blocks in a wide range of learning tasks [33].

      >  A. Applications

    1) Histogram and Contingency Table

    A histogram is typically defined over a specific domain as a partition of data points into disjoint bins. Two widely used types of histograms are unattributed (where the semantic meaning of each bin is irrelevant to the analysis, the histogram can be viewed as a multiset of counts) and attributed (the semantic meaning of each bin is maintained). Histogram sanitization under a formal privacy was introduced early by Chawla et al. [34]. Although not exactly in the sense of differential privacy, (c,t)-isolation [34] has a similar idea of protecting items from isolation attacks. Its limitation lies in the assumption of data point distribution uniformity.

    A Contingency table is informally a table of counts over a set of attributes. These counts are called marginals, and their release has a goal of revealing correlations between many different sets of attributes. Barak et al. [35] applied a Laplace mechanism to the Fourier coefficients of a dataset vector (after being cast from highdimensional space, indexed by attribute tuples), and then employed linear programming to create non-negative synthetic contingency tables. An improvement for sparse data is sketched by Cormode et al. [26], with a short-cut approach using sampling and filtering techniques. Geometric noise [20] was added to fulfill ε-differential privacy.

    Hay et al. [36] pointed out that consistency-constrained inference, if used as a post-processing step, is able to boost the accuracy of histogram queries, for both unattributed and universal histograms. An application of the concepts presented by Hay et al. [36] to accurately estimate the degree distribution of private networks was demonstrated in another study [37].

    Range count queries are optimized by wavelet transform in Privelet technique [25]. Privelet adds polylogarithmic noise to the transformed data. One-dimensional ordinal and nominal data require the Haar wavelet and nominal wavelet, respectively, while multi-dimensional data need standard decompositions to apply a one-dimensional transform along each dimension in turn.

    Further improvement on DP-compliant histogram publication for compressible histograms was demonstrated by Acs et al. [19]. Enhanced Fourier perturbation uses a better score function for exponential mechanism-based sampling. It exploits the redundancy of Fourier coefficients. Alternatively, private hierarchical partition (PHPartition) is a clustering-based sanitization that exploits the redundancy between bins.

    2) Linear Counting Queries

    Li et al. [38] generalize two approaches [25,36] by a matrix mechanism, an algorithm answering a workload W of linear counting queries. Their idea is to use another set of queries A (called a query strategy) as a query proxy to the database. Noisy answers on the query strategy are then used to derive answers for the workload. By doing so, they hoped to exploit the noise distribution correlation to increase accuracy. A semidefinite program with a rank constraint was formulated with complexity O(n8), and some approximations were also developed. The distinct novelty of this approach [38] was to point out that some of the mechanisms [25,36] are special cases of the matrix mechanism. Eigen-Design [39] is an efficient implementation with a complexity of O(n4). It uses a singular value decomposition lower bound to reduce the search space for matrix A from full space ?n×n to linear space spanned by the eigenvectors of WTW.

    The full-rank limitation [38] was emphasized by Yuan et al. [40]. They proposed the low-rank mechanism (LRM) to reach the theoretical lower bound proven by Hardt and Talwar [41]. The workload matrix was decomposed, and the constraint was relaxed. The results show LRM’s outperformance over the previous studies [25,36] in most query scenarios.

    Other approaches [41,42] used convex geometry to devise differentially private mechanisms and upper/lower bounds for query workloads. These schemes were based on the exponential mechanism.

    3) Partitioning

    Tree-based partitioning techniques find useful applications under differential privacy. Cormode et al. [23] focused on spatial data indexing for range queries while not revealing data points. Their method, private spatial decomposition, addresses both data-dependent (e.g., quadtree) and data-independent (e.g., kd-tree) tree structures with geometric privacy budget allocation. For data-dependent structures, we need a private mechanism (e.g., private median) for choosing splitting points. Post-processing enhances the query accuracy. DP-tree, proposed by Peng et al. [43], builds a nested tree structure with consistency enforcement and adaptive privacy budget assignment. The work improves the asymptotic error bound and query accuracy compared to the approach by Cormode et al. [23].

    To publish sequential data like Web browsing histories and mobility traces in private settings, Chen et al. [24] employed a variable-length n-gram model, which is widely used in natural language processing, to extract essential information from a sequential dataset and build an exploration tree satisfying ε-DP. Synthetic data constructed from the tree can be safely used by analysts. The salient contribution of this work is to promote novel techniques based on Markov assumptions. Similarly, private set-valued data publishing, which is high-dimensional by nature, needs data-dependent partitioning to ensure utility. Chen et al. [9] demonstrated such an approach with the help of context-free taxonomy trees. Their scheme is (α, β)-useful and can be applied to the context of relational databases.

    4) Learning Tasks

    Great effort dedicated to blending ε-DP with traditional learning tasks has been demonstrated throughout a series of papers over the last ten years. Examples include clustering tasks like k-means [33], mixture of Gaussian [8]; classification tasks like ID3 [33,16], C4.5 [17], Gaussian classifier [44]; and linear and logistic regression tasks [13,45]. Other tasks like dimensionality reduction [33,46,47] and statistical estimators [48,49] also have private versions. Support vector machine [50] and boosting [51] have also been studied in the context of differential privacy. However, the applicability of ε-DP to a vast number of other learning techniques remains unclear. One of the reasons for this is the cumbersome sensitivity analysis accompanied with ε-DP.

      >  B. Frameworks

    Several analysis frameworks and runtime toolkits have been developed for simplifying the usage of differential privacy in daily data processing tasks, hence shaping ε-DP thinking. Sub-linear query (SuLQ) [33] is a primitive that is powerful enough to be used in a collection of learning techniques. Privacy integrated query (PINQ) written by McSherry [22] is a building block in various applications like network trace analysis [52], and recommender systems [53]. For the MapReduce framework, Airavat [54] combined mandatory access control and differential privacy to provide strong security and privacy guarantees for distributed computations on sensitive data. It has some limitations in composition and requires trusted mappers. PASTE [18] aims at the aggregation of distributed time-series data via ε-DP Fourier transformation and homomorphic encryption. GUPT [55] uses a sample-and-aggregate framework [8] and a new model of data sensitivity. It resists side-channel attacks and gains better accuracy with re-sampling techniques.

    For non-interactive contexts, a formal learning theory was proposed by Blum et al. [14]. They demonstrated that it is possible to release synthetic private databases that are useful for all queries over a discretized domain from a concept class with polynomial VC dimension. However, their mechanism is inefficient, because it requires sampling from a non-trivial probability distribution over an unstructured space of exponential size. Nissim et al. [8] provided an alternative approach to sensitivity analysis that is data-dependent and can solve hard problems like median publishing or the cost of the minimum spanning tree. Beyond that, it puts forward a general sampling technique called sample-and-aggregate [8].

    V. CONTINUAL OBSERVATION SETTING

      >  A. Methods

    The continual observation setting in which data aggregators continually release updated statistics also needs to be placed in a differential privacy context to protect the contributor’s privacy. Examples include Amazon, IMDb’s popular item recommendations, and Google and Yahoo’s hot-search keyword suggestions. Chan et al. [56] addressed the continual counting problem over a bit stream (bounded or unbounded time). Using ideas of p-sum as intermediate results, they came up with the binary counting mechanism for a time-boundedness case with usefulness

    image

    The extension to time-unboundedness with a hybrid mechanism achieves the same utility. To satisfy the consistency condition (where the difference between the current count and the previous value is 0 or 1), the error increases by a factor of (log t)2. A bit of modification converts the Hybrid mechanism to a pan-private one [57]. However, the work was limited to event-level privacy only [56].

    Independently, Dwork et al. [57] studied the pan-private algorithms. Roughly speaking, these algorithms retain their privacy properties even if their internal state become visible to intruders. Another contribution was the user-level privacy regarding the existence of all events that belong to a user in the stream, which is stronger than event-level privacy. ε-differentially pan-private versions of several counting algorithms were provided, such as the density estimator, t-cropped mean estimator, k-heavy hitters estimator, t-incidence estimator, and mod-k incidence counter [57]. Some impossibility results and separation results between randomized response and private sketching were also provided.

    VI. ATTACKS

      >  A. Blatant Non-privacy

    In an interactive setting, a database d in the form of an n-bit vector is blatantly non-private [28] if after interacting with the database curator, an adversary can reconstruct a candidate database c that agrees with d on all but o(n) entries. Dinur and Nissim [28] show that adding

    image

    noise to every response is blatantly non-private against a polynomial-time bounded attacker asking O(n log2 n) queries. The attack consists of two steps: posing O(n log2 n) random subset-sum queries, and solving a linear program with n variables and O(n log2 n) constraints then rounding the results. Dwork and Yekhanin [58] claimed the inefficiency of this attack by a worst-case running time O(n5 log4 n), and proposed a sharper attack relying on the basic properties of the Fourier transform over the group ?2k. Their method requires only n queries and runs in O(n log n). The second contribution was the interpolation attack against a class of curators adding at least (1/2 + ε) fraction of queries with low noise. The main idea was to achieve error-correction via polynomial interpolation with a running time of poly(e/ε). A more comprehensive summary of blatant non-privacy is available elsewhere [59].

      >  B. Side-Channel Attacks

    Apart from blatant non-privacy, there exist other vulnerabilities under side-channel attacks. Processing sideeffects like long or rejected responses also reveal some information about victims. Haeberlen et al. [60] pointed out such threats against the existing systems PINQ [22] and Airavat [54]. They tested three attacks: timing attack, state attack, and privacy budget attack. By intentionally pausing for a long time in the query code when a certain condition is detected, the privacy mechanism reveals one bit (yes/no). The state attack exploits a global variable to open a channel between microqueries. The privacy budget attack checks how much the given privacy budget has decreased when the outer query returns. To cope with these attacks, a Fuzz system with built-in attack-resistance capabilities was proposed. The GUPT system [55] was claimed to be safe under these three types of attacks.

      >  C. No Free Lunch in Data Privacy

    Kifer and Machanavajjhala [61] critically analysed the privacy protections under differential privacy via nonprivacy games. They addressed several popular misconceptions about differential privacy, including: that it makes no assumptions about how data are generated; that it protects an individual’s information even if an attacker knows about all other individuals in the data; and that it is robust to arbitrary background knowledge. By employing a no-free-lunch theorem, it was argued that it is not possible to offer privacy and utility without making assumptions about how the data are generated. They emphasized that a user’s privacy is preserved if the attacker’s inference about the user’s participation in the data generating process is limited. It is difficult to come up with a general definition of participation that applies to all data generating mechanisms. These ideas were clarified through examples from social network research and tabular data. In a social network case study, several network evolution models were used to evaluate how the existence of an edge is revealed in special cases. Similarly, in contingency tables, differential privacy may become useless if used after deterministic data is released.

    VII. CONCLUSIONS

    Since its emergence, differential privacy has expanded its frontier rapidly with many interesting ideas under considerations such as the geometry, the algorithmic complexity of differential privacy, or alternative privacy guarantees that are composed automatically and give better accuracy. The determination of a conceptually simple definition of differential privacy has attracted great interest over the last decade. Interactivity and non-interactivity have both witnessed theoretical and practical advancements. Connections between differential privacy and other fields such as cryptography, statistics, complexity, combinatorics, mechanism design, and optimization provide fertile ground for upcoming growth. This paper is a short recapitulation of the main results from these works in the hope of promoting this emerging privacy model. We have emphasized the motivations, popular mechanisms, and typical work in various settings. Existing work on the misconceptions about differential privacy and possible attacks in special cases suggest that it should be used with caution.

  • 1. Sweeney L. 2002 “k-Anonymity: a model for protecting privacy” [International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems] Vol.10 P.557-570 google doi
  • 2. Machanavajjhala A., Kifer D., Gehrke J., Venkitasubramaniam M. 2007 “l-Diversity: privacy beyond k-anonymity” [ACM Transactions on Knowledge Discovery from Data] Vol.1 google doi
  • 3. Li N., Li T., Venkatasubramanian S. 2007 “t-Closeness: privacy beyond k-anonymity and l-diversity” [in Proceedings of the IEEE 23rd International Conference on Data Engineering] P.106-115 google
  • 4. Fung B. C. M., Wang K., Chen R., Yu P. S. 2010 “Privacypreserving data publishing: a survey of recent developments” [ACM Computing Surveys] Vol.42 google doi
  • 5. Dwork C. 2011 “A firm foundation for private data analysis” [Communications of the ACM] Vol.54 P.86-95 google doi
  • 6. Dwork C, Kenthapadi K., McSherry F., Mironov I., Naor M. 2006 “Our data, ourselves: privacy via distributed noise generation” [in Proceedings of the 24th Annual International Conference on The Theory and Applications of Cryptographic Techniques] P.486-503 google
  • 7. Dwork C., Bugliesi M. 2006 “Differential privacy,” Automata, Languages and Programming, Lecture Notes in Computer Science vol. 4052 P.1-12 google
  • 8. Nissim K., Raskhodnikova S., Smith A. 2007 “Smooth sensitivity and sampling in private data analysis” [in Proceedings of the 39th Annual ACM Symposium on Theory of Computing] P.75-84 google
  • 9. Chen R., Mohammed N., Fung B. C., Desai B. C., Xiong L. 2011 “Publishing set-valued data via differential privacy” [Proceedings of the VLDB Endowment] Vol.4 P.1087-1098 google
  • 10. Li N., Qardaji W., Su D., Cao J. 2012 “PrivBasis: frequent itemset mining with differential privacy” [Proceedings of the VLDB Endowment] Vol.5 P.1340-1351 google
  • 11. McSherry F., Talwar K. 2007 “Mechanism design via differential privacy” [in Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science] P.94-103 google
  • 12. Dwork C., McSherry F., Nissim K., Smith A. 2006 “Calibrating noise to sensitivity in private data analysis” [in Proceedings of the 3rd Conference on Theory of Cryptography] P.265-284 google
  • 13. Williams O., McSherry F. 2010 “Probabilistic inference and differential privacy” [in Proceedings of the 24th Annual Conference on Neural Information Processing Systems] google
  • 14. Blum A., Ligett K., Roth A. 2008 “A learning theory approach to non-interactive database privacy” [in Proceedings of the 40th Annual ACM Symposium on Theory of Computing] P.609-618 google
  • 15. Bhaskar R., Laxman S., Smith A., Thakurta A. 2010 “Discovering frequent patterns in sensitive data” [in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining] P.503-512 google
  • 16. Friedman A., Schuster A. 2010 “Data mining with differential privacy” [in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining] P.493-502 google
  • 17. Mohammed N., Chen R., Fung B. C. M., Yu P. S. 2011 “Differentially private data release for data mining” [in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining] P.493-501 google
  • 18. Rastogi V., Nath S. 2010 “Differentially private aggregation of distributed time-series with transformation and encryption” [in Proceedings of the ACM SIGMOD International Conference on Management of Data] P.735-746 google
  • 19. Acs G., Castelluccia C., Chen R. 2012 “Differentially private histogram publishing through lossy compression” [in Proceedings of the IEEE 12th International Conference on Data Mining] google
  • 20. Ghosh A., Roughgarden T., Sundararajan M. 2009 “Universally utility-maximizing privacy mechanisms” [in Proceedings of the 41st Annual ACM Symposium on Theory of Computing] P.351-360 google
  • 21. Ganta S. R., Kasiviswanathan S. P., Smith A. 2008 “Composition attacks and auxiliary information in data privacy” [in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining] P.265-273 google
  • 22. McSherry F. D. 2009 “Privacy integrated queries: an extensible platform for privacy-preserving data analysis” [in Proceedings of the 35th SIGMOD International Conference on Management of Data] P.19-30 google
  • 23. Cormode G., Procopiuc C., Srivastava D., Shen E., Yu T. 2012 “Differentially private spatial decompositions” [in Proceedings of the 28th IEEE International Conference on Data Engineering] P.20-31 google
  • 24. Chen R., Acs G., Castelluccia C. 2012 “Differentially private sequential data publication via variable-length n-grams” [in Proceedings of the 2012 ACM Conference on Computer and Communications Security] P.638-649 google
  • 25. Xiao X., Wang G., Gehrke J. 2011 “Differential privacy via wavelet transforms” [IEEE Transactions on Knowledge and Data Engineering] Vol.23 P.1200-1214 google doi
  • 26. Cormode G., Procopiuc C., Srivastava D., Tran T. T. 2012 “Differentially private summaries for sparse data” [in Proceedings of the 15th International Conference on Database Theory] P.299-311 google
  • 27. Hardt M., Ligett K., McSherry F. 2010 “A simple and practical algorithm for differentially private data release” google
  • 28. Dinur I., Nissim K. 2003 “Revealing information while preserving privacy” [in Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems] P.202-210 google
  • 29. Roth A., Roughgarden T. 2010 “Interactive privacy via the median mechanism” [in Proceedings of the 42nd ACM Symposium on Theory of Computing] P.765-774 google
  • 30. Hardt M., Rothblum G. N. 2010 “A multiplicative weights mechanism for privacy-preserving data analysis” [in Proceedings of the IEEE 51st Annual Symposium on Foundations of Computer Science] P.61-70 google
  • 31. Vapnik V. N., Chervonenkis A. Y. 1971 “On the uniform convergence of relative frequencies of events to their probabilities” [Theory of Probability & Its Applications] Vol.16 P.264-280 google doi
  • 32. Dwork C., Naor M., Reingold O., Rothblum G. N., Vadhan S. 2009 “On the complexity of differentially private data release: efficient algorithms and hardness results” [in Proceedings of the 41st Annual ACM Symposium on Theory of Computing] P.381-390 google
  • 33. Blum A., Dwork C., McSherry F., Nissim K. 2005 “Practical privacy: the SuLQ framework” [in Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems] P.128-138 google
  • 34. Chawla S., Dwork C., McSherry F., Smith A., Wee H. 2005 “Toward privacy in public databases” [in Proceedings of the 2nd International Conference on Theory of Cryptography] P.363-385 google
  • 35. Barak B., Chaudhuri K., Dwork C., Kale S., McSherry F., Talwar K. 2007 “Privacy, accuracy, and consistency too: a holistic solution to contingency table release” [in Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems] P.273-282 google
  • 36. Hay M., Rastogi V., Miklau G., Suciu D. 2010 “Boosting the accuracy of differentially private histograms through consistency” [Proceedings of the VLDB Endowment] Vol.3 P.1021-1032 google
  • 37. Hay M., Li C., Miklau G., Jensen D. 2009 “Accurate estimation of the degree distribution of private networks” [in Proceedings of the 9th IEEE International Conference on Data Mining] P.169-178 google
  • 38. Li C., Hay M., Rastogi V., Miklau G., McGregor A. 2010 “Optimizing linear counting queries under differential privacy” [in Proceedings of the 29th ACM SIGMOD-SIGACTSIGART Symposium on Principles of Database Systems] P.123-134 google
  • 39. Li C., Miklau G. 2012 “An adaptive mechanism for accurate query answering under differential privacy” [Proceedings of the VLDB Endowment] Vol.5 P.514-525 google
  • 40. Yuan G., Zhang Z., Winslett M., Xiao X., Yang Y., Hao Z. 2012 “Low-rank mechanism: optimizing batch queries under differential privacy” [Proceedings of the VLDB Endowment] Vol.5 P.1352-1363 google
  • 41. Hardt M., Talwar K. 2010 “On the geometry of differential privacy” [in Proceedings of the 42nd ACM Symposium on Theory of Computing] P.705-714 google
  • 42. Bhaskara A., Dadush D., Krishnaswamy R., Talwar K. 2012 “Unconditional differentially private mechanisms for linear queries” [in Proceedings of the 44th Symposium on Theory of Computing] P.1269-1284 google
  • 43. Peng S., Yang Y., Zhang Z., Winslett M., Yu Y. 2012 “DPtree: indexing multi-dimensional data under differential privacy” [in Proceedings of the ACM SIGMOD International Conference on Management of Data] P.864-864 google
  • 44. Pathak M. A., Raj B. 2012 “Large margin Gaussian mixture models with differential privacy” [IEEE Transactions on Dependable and Secure Computing] Vol.9 P.463-469 google doi
  • 45. Zhang J., Zhang Z., Xiao X., Yang Y., Winslett M. 2012 “Functional mechanism: regression analysis under differential privacy” [Proceedings of the VLDB Endowment] Vol.5 P.1364-1375 google
  • 46. Zhou S., Ligett K., Wasserman L. 2009 “Differential privacy with compression” [in Proceedings of the IEEE International Conference on Symposium on Information Theory] P.2718-2722 google
  • 47. Chaudhuri K., Sarwate A. D., Sinha K. 2011 “Near-optimal differentially private principal components” [in Proceedings of the 25th Annual Conference on Neural Information Processing Systems] P.998-1006 google
  • 48. Smith A. 2008 “Efficient, differentially private point estimators” google
  • 49. Lei J. 2011 “Differentially private M-estimators” [in Proceedings of the 25th Annual Conference on Neural Information Processing Systems] P.361-369 google
  • 50. Rubinstein B. I. P., Bartlett P. L., Huang L., Taft N. 2009 “Learning in a large function space: privacy-preserving mechanisms for SVM learning” google
  • 51. Dwork C., Rothblum G. N., Vadhan S. 2010 “Boosting and differential privacy” [in Proceedings of the IEEE 51st Annual Symposium on Foundations of Computer Science] P.51-60 google
  • 52. McSherry F., Mahajan R. 2010 “Differentially-private network trace analysis” [in Proceedings of the ACM SIGCOMM 2010 Conference] P.123-134 google
  • 53. McSherry F., Mironov I. 2009 “Differentially private recommender systems: building privacy into the net” [in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining] P.627-636 google
  • 54. Roy I., Setty S. T. V., Kilzer A., Shmatikov V., Witchel E. 2010 “Airavat: security and privacy for MapReduce” [in Proceedings of the 7th USENIX Symposium on Networked Systems Design and Implementation] P.20-20 google
  • 55. Mohan P., Thakurta A., Shi E., Song D., Culler D. 2012 “GUPT: privacy preserving data analysis made easy” [in Proceedings of the ACM SIGMOD International Conference on Management of Data] P.349-360 google
  • 56. Chan T. H., Shi E., Song D. 2011 “Private and continual release of statistics” [ACM Transactions on Information and System Security] Vol.14 google doi
  • 57. Dwork C., Naor M., Pitassi T., Rothblum G. N., Yekhanin S. 2010 “Pan-private streaming algorithms” [in Proceedings of the 1st Symposium on Innovations in Computer Science] P.66-80 google
  • 58. Dwork C., Yekhanin S. 2008 “New efficient attacks on statistical disclosure control mechanisms” [in Proceedings of the 28th Annual Conference on Cryptology: Advances in Cryptology] P.469-480 google
  • 59. Dwork C. 2007 “Ask a better question, get a better answer a new approach to private data analysis” [in Proceedings of the 11th International Conference on Database Theory] P.18-27 google
  • 60. Haeberlen A., Pierce B. C., Narayan A. 2011 “Differential privacy under fire” [in Proceedings of the 20th USENIX Conference on Security] google
  • 61. Kifer D., Machanavajjhala A. 2011 “No free lunch in data privacy” [in Proceedings of the ACM SIGMOD International Conference on Management of Data] P.193-204 google