Privacy-Preserving Data Mining

by Aggarwal, Charu C.; Yu, Philip S.

ISBN13: 9780387709918

ISBN10: 0387709916

Format: Hardcover

Pub. Date: 2008-07-01

Publisher(s): Springer-Verlag New York Inc

Other versions by this Author

List Price: ~~$230.99~~

Rent Textbook

Select for Price

Add to Cart

There was a problem. Please try again later.

Rent Digital

Online:30 Days access
Downloadable:30 Days

$82.44

Online:60 Days access
Downloadable:60 Days

$109.92

Online:90 Days access
Downloadable:90 Days

$137.40

Online:120 Days access
Downloadable:120 Days

$164.88

Online:180 Days access
Downloadable:180 Days

$178.62

Online:1825 Days access
Downloadable:Lifetime Access

$274.80

*To support the delivery of the digital material to you, a digital delivery fee of $3.99 will be charged on each digital item.

$178.62*

Add to Cart

New Textbook

We're Sorry
Sold Out

Used Textbook

We're Sorry
Sold Out

Buy from our Marketplace starting at $90.96

Summary

Advances in hardware technology have increased the capability to store and record personal data about consumers and individuals. This has caused concerns that personal data may be used for a variety of intrusive or malicious purposes. Privacy Preserving Data Mining: Models and Algorithms proposes a number of techniques to perform the data mining tasks in a privacy-preserving way. These techniques generally fall into the following categories: data modification techniques, cryptographic methods and protocols for data sharing, statistical techniques for disclosure and inference control, query auditing methods, randomization and perturbation-based techniques. This edited volume contains surveys by distinguished researchers in the privacy field. Each survey includes the key research content as well as future research directions of a particular topic in privacy. Privacy Preserving Data Mining: Models and Algorithms is designed for researchers, professors, and advanced-level students in computer science. This book is also suitable for practitioners in industry.

Preface	p. v
List of Figures	p. xvii
List of Tables	p. xxi
An Introduction to Privacy-Preserving Data Mining	p. 1
Introduction	p. 1
Privacy-Preserving Data Mining Algorithms	p. 3
Conclusions and Summary	p. 7
References	p. 8
A General Survey of Privacy-Preserving Data Mining Models and Algorithms	p. 11
Introduction	p. 11
The Randomization Method	p. 13
Privacy Quantification	p. 15
Adversarial Attacks on Randomization	p. 18
Randomization Methods for Data Streams	p. 18
Multiplicative Perturbations	p. 19
Data Swapping	p. 19
Group Based Anonymization	p. 20
The k-Anonymity Framework	p. 20
Personalized Privacy-Preservation	p. 24
Utility Based Privacy Preservation	p. 24
Sequential Releases	p. 25
The l-diversity Method	p. 26
The t-closeness Model	p. 27
Models for Text, Binary and String Data	p. 27
Distributed Privacy-Preserving Data Mining	p. 28
Distributed Algorithms over Horizontally Partitioned Data Sets	p. 30
Distributed Algorithms over Vertically Partitioned Data	p. 31
Distributed Algorithms for k-Anonymity	p. 32
Privacy-Preservation of Application Results	p. 32
Association Rule Hiding	p. 33
Downgrading Classifier Effectiveness	p. 34
Query Auditing and Inference Control	p. 34
Limitations of Privacy: The Curse of Dimensionality	p. 37
Applications of Privacy-Preserving Data Mining	p. 38
Medical Databases: The Scrub and Datafly Systems	p. 39
Bioterrorism Applications	p. 40
Homeland Security Applications	p. 40
Genomic Privacy	p. 42
Summary	p. 43
References	p. 43
A Survey of Inference Control Methods for Privacy-Preserving Data Mining	p. 53
Introduction	p. 54
A classification of Microdata Protection Methods	p. 55
Perturbative Masking Methods	p. 58
Additive Noise	p. 58
Microaggregation	p. 59
Data Wapping and Rank Swapping	p. 61
Rounding	p. 62
Resampling	p. 62
PRAM	p. 62
MASSC	p. 63
Non-perturbative Masking Methods	p. 63
Sampling	p. 64
Global Recoding	p. 64
Top and Bottom Coding	p. 65
Local Suppression	p. 65
Synthetic Microdata Generation	p. 65
Synthetic Data by Multiple Imputation	p. 65
Synthetic Data by Bootstrap	p. 66
Synthetic Data by Latin Hypercube Sampling	p. 66
Partially Synthetic Data by Cholesky Decomposition	p. 67
Other Partially Synthetic and Hybrid Microdata Approaches	p. 67
Pros and Cons of Synthetic Microdata	p. 68
Trading off Information Loss and Disclosure Risk	p. 69
Score Construction	p. 69
R-U Maps	p. 71
k-anonymity	p. 71
Conclusions and Research Directions	p. 72
References	p. 73
Measures of Anonymity	p. 81
Introduction	p. 81
What is Privacy?	p. 82
Data Anonymization Methods	p. 83
A Classification of Methods	p. 84
Statistical Measures of Anonymity	p. 85
Query Restriction	p. 85
Anonymity via Variance	p. 85
Anonymity via Multiplicity	p. 86
Probabilistic Measures of Anonymity	p. 87
Measures Based on Random Perturbation	p. 87
Measures Based on Generalization	p. 90
Utility vs Privacy	p. 94
Computational Measures of Anonymity	p. 94
Anonymity via Isolation	p. 97
Conclusions and New Directions	p. 97
New Directions	p. 98
References	p. 99
k-Anonymous Data Mining: A Survey	p. 105
Introduction	p. 105
k-Anonymity	p. 107
Algorithms for Enforcing k-Anonymity	p. 110
k-Anonymity Threats from Data Mining	p. 117
Association Rules	p. 118
Classification Mining	p. 118
k-Anonymity in Data Mining	p. 120
Anonymize-and-Mine	p. 123
Mine-and-Anonymize	p. 126
Enforcing k-Anonymity on Association Rules	p. 126
Enforcing k-Anonymity on Decision Trees	p. 130
Conclusions	p. 133
Acknowledgments	p. 133
References	p. 134
A Survey of Randomization Methods for Privacy-Preserving Data Mining	p. 137
Introduction	p. 137
Reconstruction Methods for Randomization	p. 139
The Bayes Reconstruction Method	p. 139
The EM Reconstruction Method	p. 141
Utility and Optimality of Randomization Models	p. 143
Applications of Randomization	p. 144
Privacy-Preserving Classification with Randomization	p. 144
Privacy-Preserving OLAP	p. 145
Collaborative Filtering	p. 145
The Privacy-Information Loss Tradeoff	p. 146
Vulnerabilities of the Randomization Method	p. 149
Randomization of Time Series Data Streams	p. 151
Multiplicative Noise for Randomization	p. 152
Vulnerabilities of Multiplicative Randomization	p. 153
Sketch Based Randomization	p. 153
Conclusions and Summary	p. 154
References	p. 154
A Survey of Multiplicative Perturbation for Privacy-Preserving Data Mining	p. 157
Introduction	p. 158
Data Privacy vs. Data Utility	p. 159
Outline	p. 160
Definition of Multiplicative Perturbation	p. 161
Notations	p. 161
Rotation Perturbation	p. 161
Projection Perturbation	p. 162
Sketch-based Approach	p. 164
Geometric Perturbation	p. 164
Transformation Invariant Data Mining Models	p. 165
Definition of Transformation Invariant Models	p. 166
Transformation-Invariant Classification Models	p. 166
Transformation-Invariant Clustering Models	p. 167
Privacy Evaluation for Multiplicative Perturbation	p. 168
A Conceptual Multidimensional Privacy Evaluation Model	p. 168
Variance of Difference as Column Privacy Metric	p. 169
Incorporating Attack Evaluation	p. 170
Other Metrics	p. 171
Attack Resilient Multiplicative Perturbations	p. 171
Naive Estimation to Rotation Perturbation	p. 171
ICA-Based Attacks	p. 173
Distance-Inference Attacks	p. 174
Attacks with More Prior Knowledge	p. 176
Finding Attack-Resilient Perturbations	p. 177
Conclusion	p. 177
Acknowledgment	p. 178
References	p. 179
A Survey of Quantification of Privacy Preserving Data Mining Algorithms	p. 183
Introduction	p. 184
Metrics for Quantifying Privacy Level	p. 186
Data Privacy	p. 186
Result Privacy	p. 191
Metrics for Quantifying Hiding Failure	p. 192
Metrics for Quantifying Data Quality	p. 193
Quality of the Data Resulting from the PPDM Process	p. 193
Quality of the Data Mining Results	p. 198
Complexity Metrics	p. 200
How to Select a Proper Metric	p. 201
Conclusion and Research Directions	p. 202
References	p. 202
A Survey of Utility-based Privacy-Preserving Data Transformation Methods	p. 207
Introduction	p. 208
What is Utility-based Privacy Preservation?	p. 209
Types of Utility-based Privacy Preservation Methods	p. 210
Privacy Models	p. 210
Utility Measures	p. 212
Summary of the Utility-Based Privacy Preserving Methods	p. 214
Utility-Based Anonymization Using Local Recoding	p. 214
Global Recoding and Local Recoding	p. 215
Utility Measure	p. 216
Anonymization Methods	p. 217
Summary and Discussion	p. 219
The Utility-based Privacy Preserving Methods in Classification Prob-lems	p. 219
The Top-Down Specialization Method	p. 220
The Progressive Disclosure Algorithm	p. 224
Summary and Discussion	p. 228
Anonymized Marginal: Injecting Utility into Anonymized Data Sets	p. 228
Anonymized Marginal	p. 229
Utility Measure	p. 230
Injecting Utility Using Anonymized Marginals	p. 231
Summary and Discussion	p. 233
Summary	p. 234
Acknowledgments	p. 234
References	p. 234
Mining Association Rules under Privacy Constraints	p. 239
Introduction	p. 239
Problem Framework	p. 240
Database Model	p. 240
Mining Objective	p. 241
Privacy Mechanisms	p. 241
Privacy Metric	p. 243
Accuracy Metric	p. 245
Evolution of the Literature	p. 246
The FRAPP Framework	p. 251
Reconstruction Model	p. 252
Estimation Error	p. 253
Randomizing the Perturbation Matrix	p. 256
Efficient Perturbation	p. 256
Integration with Association Rule Mining	p. 258
Sample Results	p. 259
Closing Remarks 263 Acknowledgments	p. 263
References	p. 263
A Survey of Association Rule Hiding Methods for Privacy	p. 267
Introduction	p. 267
Terminology and Preliminaries	p. 269
Taxonomy of Association Rule Hiding Algorithms	p. 270
Classes of Association Rule Algorithms	p. 271
Heuristic Approaches	p. 272
Border-based Approaches	p. 277
Exact Approaches	p. 278
Other Hiding Approaches	p. 279
Metrics and Performance Analysis	p. 281
Discussion and Future Trends	p. 284
Conclusions 285 References	p. 286
A Survey of Statistical Approaches to Preserving Confidentiality of Contingency Table Entries	p. 291
Introduction	p. 291
The Statistical Approach Privacy Protection	p. 292
Datamining Algorithms, Association Rules, and Disclosure Limitation	p. 294
Estimation and Disclosure Limitation for Multi-way Contingency Tables	p. 295
Two Illustrative Examples	p. 301
Example 1: Data from a Randomized Clinical Trial	p. 301
Example 2: Data from the 1993 U.S. Current Population Survey	p. 305
Conclusions	p. 308
Acknowledgments	p. 309
References	p. 309
A Survey of Privacy-Preserving Methods Across Horizontally Partitioned Data	p. 313
Introduction	p. 313
Basic Cryptographic Techniques for Privacy-Preserving Distributed Data Mining	p. 315
Common Secure Sub-protocols Used in Privacy-Preserving Distributed Data Mining	p. 318
Privacy-preserving Distributed Data Mining on Horizontally Partitioned Data	p. 323
Comparison to Vertically Partitioned Data Model	p. 326
Extension to Malicious Parties	p. 327
Limitations of the Cryptographic Techniques Used in Privacy-Preserving Distributed Data Mining	p. 329
Privacy Issues Related to Data Mining Results	p. 330
Conclusion	p. 332
References	p. 332
A Survey of Privacy-Preserving Methods Across Vertically Partitioned Data	p. 337
Introduction	p. 337
Classification	p. 341
Naïve Bayes Classification	p. 342
Bayesian Network Structure Learning	p. 343
Decision Tree Classification	p. 344
Clustering	p. 346
Association Rule Mining	p. 347
Outlier detection	p. 349
Algorithm	p. 351
Security Analysis	p. 352
Computation and Communication Analysis	p. 354
Challenges and Research Directions	p. 355
References	p. 356
A Survey of Attack Techniques on Privacy-Preserving Data Perturbation Methods	p. 359
Introduction	p. 360
Definitions and Notation	p. 360
Attacking Additive Data Perturbation	p. 361
Eigen-Analysis and PCA Preliminaries	p. 362
Spectral Filtering	p. 363
SVD Filtering	p. 364
PCA Filtering	p. 365
MAP Estimation Attack	p. 366
Distribution Analysis Attack	p. 367
Summary	p. 367
Attacking Matrix Multiplicative Data Perturbation	p. 369
Known I/O Attacks	p. 370
Known Sample Attack	p. 373
Other Attacks Based on ICA	p. 374
Summary	p. 375
Attacking k-Anonymization	p. 376
Conclusion 376 Acknowledgments 377 References	p. 377
Private Data Analysis via Output Perturbation	p. 383
Introduction	p. 383
The Abstract Model - Statistical Databases, Queries, and Sanitizers	p. 385
Privacy	p. 388
Interpreting the Privacy Definition	p. 390
The Basic Technique: Calibrating Noise to Sensitivity	p. 394
Applications: Functions with Low Global Sensitivity	p. 396
Constructing Sanitizers for Complex Functionalities	p. 400
k-Means Clustering	p. 401
SVD and PCA	p. 403
Learning in the Statistical Queries Model	p. 404
Beyond the Basics	p. 405
Instance Based Noise and Smooth Sensitivity	p. 406
The Sample-Aggregate Framework	p. 408
A General Sanitization Mechanism	p. 409
Related Work and Bibliographic Notes	p. 409
Acknowledgments	p. 411
References	p. 411
A Survey of Query Auditing Techniques for Data Privacy	p. 415
Introduction	p. 415
Auditing Aggregate Queries	p. 416
Offline Auditing	p. 417
Online Auditing	p. 418
Auditing Select-Project-Join Queries	p. 426
Challenges in Auditing	p. 427
Reading	p. 429
References	p. 430
Privacy and the Dimensionality Curse	p. 433
Introduction	p. 433
The Dimensionality Curse and the k-anonymity Method	p. 435
The Dimensionality Curse and Condensation	p. 441
The Dimensionality Curse and the Randomization Method	p. 446
Effects of Public Information	p. 446
Effects of High Dimensionality	p. 450
Gaussian Perturbing Distribution	p. 450
Uniform Perturbing Distribution	p. 455
The Dimensionality Curse and l-diversity	p. 458
Conclusions and Research Directions	p. 459
References	p. 460
Personalized Privacy Preservation	p. 461
Introduction	p. 461
Formalization of Personalized Anonymity	p. 463
Personal Privacy Requirements	p. 464
Generalization	p. 465
Combinatorial Process of Privacy Attack	p. 467
Primary Case	p. 468
Non-primary Case	p. 469
Theoretical Foundation	p. 470
Notations and Basic Properties	p. 471
Derivation of the Breach Probability	p. 472
Generalization Algorithm	p. 473
The Greedy Framework	p. 474
Optimal SA-generalization	p. 476
Alternative Forms of Personalized Privacy Preservation	p. 478
Extension of k-anonymity	p. 479
Personalization in Location Privacy Protection	p. 480
Summary and Future Work	p. 482
References	p. 485
Privacy-Preserving Data Stream Classification	p. 487
Introduction	p. 487
Motivating Example	p. 488
Contributions and Paper Outline	p. 490
Related Works	p. 491
Problem Statement	p. 493
Secure Join Stream Classification	p. 493
Naive Bayesian Classifiers	p. 494
Our Approach	p. 495
Initialization	p. 495
Bottom-Up Propagation	p. 496
Top-Down Propagation	p. 497
Using NBC	p. 499
Algorithm Analysis	p. 500
Empirical Studies	p. 501
Real-life Datasets	p. 502
Synthetic Datasets	p. 504
Discussion	p. 506
Conclusions	p. 507
References	p. 508
Index	p. 511
Table of Contents provided by Publisher. All Rights Reserved.

Privacy-Preserving Data Mining

Rent Textbook

Rent Digital

New Textbook

Used Textbook

How Marketplace Works:

Summary

Table of Contents

Digital License