
Privacy-Preserving Data Mining
by Aggarwal, Charu C.; Yu, Philip S.Rent Textbook
Rent Digital
New Textbook
We're Sorry
Sold Out
Used Textbook
We're Sorry
Sold Out
How Marketplace Works:
- This item is offered by an independent seller and not shipped from our warehouse
- Item details like edition and cover design may differ from our description; see seller's comments before ordering.
- Sellers much confirm and ship within two business days; otherwise, the order will be cancelled and refunded.
- Marketplace purchases cannot be returned to eCampus.com. Contact the seller directly for inquiries; if no response within two days, contact customer service.
- Additional shipping costs apply to Marketplace purchases. Review shipping costs at checkout.
Summary
Table of Contents
Preface | p. v |
List of Figures | p. xvii |
List of Tables | p. xxi |
An Introduction to Privacy-Preserving Data Mining | p. 1 |
Introduction | p. 1 |
Privacy-Preserving Data Mining Algorithms | p. 3 |
Conclusions and Summary | p. 7 |
References | p. 8 |
A General Survey of Privacy-Preserving Data Mining Models and Algorithms | p. 11 |
Introduction | p. 11 |
The Randomization Method | p. 13 |
Privacy Quantification | p. 15 |
Adversarial Attacks on Randomization | p. 18 |
Randomization Methods for Data Streams | p. 18 |
Multiplicative Perturbations | p. 19 |
Data Swapping | p. 19 |
Group Based Anonymization | p. 20 |
The k-Anonymity Framework | p. 20 |
Personalized Privacy-Preservation | p. 24 |
Utility Based Privacy Preservation | p. 24 |
Sequential Releases | p. 25 |
The l-diversity Method | p. 26 |
The t-closeness Model | p. 27 |
Models for Text, Binary and String Data | p. 27 |
Distributed Privacy-Preserving Data Mining | p. 28 |
Distributed Algorithms over Horizontally Partitioned Data Sets | p. 30 |
Distributed Algorithms over Vertically Partitioned Data | p. 31 |
Distributed Algorithms for k-Anonymity | p. 32 |
Privacy-Preservation of Application Results | p. 32 |
Association Rule Hiding | p. 33 |
Downgrading Classifier Effectiveness | p. 34 |
Query Auditing and Inference Control | p. 34 |
Limitations of Privacy: The Curse of Dimensionality | p. 37 |
Applications of Privacy-Preserving Data Mining | p. 38 |
Medical Databases: The Scrub and Datafly Systems | p. 39 |
Bioterrorism Applications | p. 40 |
Homeland Security Applications | p. 40 |
Genomic Privacy | p. 42 |
Summary | p. 43 |
References | p. 43 |
A Survey of Inference Control Methods for Privacy-Preserving Data Mining | p. 53 |
Introduction | p. 54 |
A classification of Microdata Protection Methods | p. 55 |
Perturbative Masking Methods | p. 58 |
Additive Noise | p. 58 |
Microaggregation | p. 59 |
Data Wapping and Rank Swapping | p. 61 |
Rounding | p. 62 |
Resampling | p. 62 |
PRAM | p. 62 |
MASSC | p. 63 |
Non-perturbative Masking Methods | p. 63 |
Sampling | p. 64 |
Global Recoding | p. 64 |
Top and Bottom Coding | p. 65 |
Local Suppression | p. 65 |
Synthetic Microdata Generation | p. 65 |
Synthetic Data by Multiple Imputation | p. 65 |
Synthetic Data by Bootstrap | p. 66 |
Synthetic Data by Latin Hypercube Sampling | p. 66 |
Partially Synthetic Data by Cholesky Decomposition | p. 67 |
Other Partially Synthetic and Hybrid Microdata Approaches | p. 67 |
Pros and Cons of Synthetic Microdata | p. 68 |
Trading off Information Loss and Disclosure Risk | p. 69 |
Score Construction | p. 69 |
R-U Maps | p. 71 |
k-anonymity | p. 71 |
Conclusions and Research Directions | p. 72 |
References | p. 73 |
Measures of Anonymity | p. 81 |
Introduction | p. 81 |
What is Privacy? | p. 82 |
Data Anonymization Methods | p. 83 |
A Classification of Methods | p. 84 |
Statistical Measures of Anonymity | p. 85 |
Query Restriction | p. 85 |
Anonymity via Variance | p. 85 |
Anonymity via Multiplicity | p. 86 |
Probabilistic Measures of Anonymity | p. 87 |
Measures Based on Random Perturbation | p. 87 |
Measures Based on Generalization | p. 90 |
Utility vs Privacy | p. 94 |
Computational Measures of Anonymity | p. 94 |
Anonymity via Isolation | p. 97 |
Conclusions and New Directions | p. 97 |
New Directions | p. 98 |
References | p. 99 |
k-Anonymous Data Mining: A Survey | p. 105 |
Introduction | p. 105 |
k-Anonymity | p. 107 |
Algorithms for Enforcing k-Anonymity | p. 110 |
k-Anonymity Threats from Data Mining | p. 117 |
Association Rules | p. 118 |
Classification Mining | p. 118 |
k-Anonymity in Data Mining | p. 120 |
Anonymize-and-Mine | p. 123 |
Mine-and-Anonymize | p. 126 |
Enforcing k-Anonymity on Association Rules | p. 126 |
Enforcing k-Anonymity on Decision Trees | p. 130 |
Conclusions | p. 133 |
Acknowledgments | p. 133 |
References | p. 134 |
A Survey of Randomization Methods for Privacy-Preserving Data Mining | p. 137 |
Introduction | p. 137 |
Reconstruction Methods for Randomization | p. 139 |
The Bayes Reconstruction Method | p. 139 |
The EM Reconstruction Method | p. 141 |
Utility and Optimality of Randomization Models | p. 143 |
Applications of Randomization | p. 144 |
Privacy-Preserving Classification with Randomization | p. 144 |
Privacy-Preserving OLAP | p. 145 |
Collaborative Filtering | p. 145 |
The Privacy-Information Loss Tradeoff | p. 146 |
Vulnerabilities of the Randomization Method | p. 149 |
Randomization of Time Series Data Streams | p. 151 |
Multiplicative Noise for Randomization | p. 152 |
Vulnerabilities of Multiplicative Randomization | p. 153 |
Sketch Based Randomization | p. 153 |
Conclusions and Summary | p. 154 |
References | p. 154 |
A Survey of Multiplicative Perturbation for Privacy-Preserving Data Mining | p. 157 |
Introduction | p. 158 |
Data Privacy vs. Data Utility | p. 159 |
Outline | p. 160 |
Definition of Multiplicative Perturbation | p. 161 |
Notations | p. 161 |
Rotation Perturbation | p. 161 |
Projection Perturbation | p. 162 |
Sketch-based Approach | p. 164 |
Geometric Perturbation | p. 164 |
Transformation Invariant Data Mining Models | p. 165 |
Definition of Transformation Invariant Models | p. 166 |
Transformation-Invariant Classification Models | p. 166 |
Transformation-Invariant Clustering Models | p. 167 |
Privacy Evaluation for Multiplicative Perturbation | p. 168 |
A Conceptual Multidimensional Privacy Evaluation Model | p. 168 |
Variance of Difference as Column Privacy Metric | p. 169 |
Incorporating Attack Evaluation | p. 170 |
Other Metrics | p. 171 |
Attack Resilient Multiplicative Perturbations | p. 171 |
Naive Estimation to Rotation Perturbation | p. 171 |
ICA-Based Attacks | p. 173 |
Distance-Inference Attacks | p. 174 |
Attacks with More Prior Knowledge | p. 176 |
Finding Attack-Resilient Perturbations | p. 177 |
Conclusion | p. 177 |
Acknowledgment | p. 178 |
References | p. 179 |
A Survey of Quantification of Privacy Preserving Data Mining Algorithms | p. 183 |
Introduction | p. 184 |
Metrics for Quantifying Privacy Level | p. 186 |
Data Privacy | p. 186 |
Result Privacy | p. 191 |
Metrics for Quantifying Hiding Failure | p. 192 |
Metrics for Quantifying Data Quality | p. 193 |
Quality of the Data Resulting from the PPDM Process | p. 193 |
Quality of the Data Mining Results | p. 198 |
Complexity Metrics | p. 200 |
How to Select a Proper Metric | p. 201 |
Conclusion and Research Directions | p. 202 |
References | p. 202 |
A Survey of Utility-based Privacy-Preserving Data Transformation Methods | p. 207 |
Introduction | p. 208 |
What is Utility-based Privacy Preservation? | p. 209 |
Types of Utility-based Privacy Preservation Methods | p. 210 |
Privacy Models | p. 210 |
Utility Measures | p. 212 |
Summary of the Utility-Based Privacy Preserving Methods | p. 214 |
Utility-Based Anonymization Using Local Recoding | p. 214 |
Global Recoding and Local Recoding | p. 215 |
Utility Measure | p. 216 |
Anonymization Methods | p. 217 |
Summary and Discussion | p. 219 |
The Utility-based Privacy Preserving Methods in Classification Prob-lems | p. 219 |
The Top-Down Specialization Method | p. 220 |
The Progressive Disclosure Algorithm | p. 224 |
Summary and Discussion | p. 228 |
Anonymized Marginal: Injecting Utility into Anonymized Data Sets | p. 228 |
Anonymized Marginal | p. 229 |
Utility Measure | p. 230 |
Injecting Utility Using Anonymized Marginals | p. 231 |
Summary and Discussion | p. 233 |
Summary | p. 234 |
Acknowledgments | p. 234 |
References | p. 234 |
Mining Association Rules under Privacy Constraints | p. 239 |
Introduction | p. 239 |
Problem Framework | p. 240 |
Database Model | p. 240 |
Mining Objective | p. 241 |
Privacy Mechanisms | p. 241 |
Privacy Metric | p. 243 |
Accuracy Metric | p. 245 |
Evolution of the Literature | p. 246 |
The FRAPP Framework | p. 251 |
Reconstruction Model | p. 252 |
Estimation Error | p. 253 |
Randomizing the Perturbation Matrix | p. 256 |
Efficient Perturbation | p. 256 |
Integration with Association Rule Mining | p. 258 |
Sample Results | p. 259 |
Closing Remarks 263 Acknowledgments | p. 263 |
References | p. 263 |
A Survey of Association Rule Hiding Methods for Privacy | p. 267 |
Introduction | p. 267 |
Terminology and Preliminaries | p. 269 |
Taxonomy of Association Rule Hiding Algorithms | p. 270 |
Classes of Association Rule Algorithms | p. 271 |
Heuristic Approaches | p. 272 |
Border-based Approaches | p. 277 |
Exact Approaches | p. 278 |
Other Hiding Approaches | p. 279 |
Metrics and Performance Analysis | p. 281 |
Discussion and Future Trends | p. 284 |
Conclusions 285 References | p. 286 |
A Survey of Statistical Approaches to Preserving Confidentiality of Contingency Table Entries | p. 291 |
Introduction | p. 291 |
The Statistical Approach Privacy Protection | p. 292 |
Datamining Algorithms, Association Rules, and Disclosure Limitation | p. 294 |
Estimation and Disclosure Limitation for Multi-way Contingency Tables | p. 295 |
Two Illustrative Examples | p. 301 |
Example 1: Data from a Randomized Clinical Trial | p. 301 |
Example 2: Data from the 1993 U.S. Current Population Survey | p. 305 |
Conclusions | p. 308 |
Acknowledgments | p. 309 |
References | p. 309 |
A Survey of Privacy-Preserving Methods Across Horizontally Partitioned Data | p. 313 |
Introduction | p. 313 |
Basic Cryptographic Techniques for Privacy-Preserving Distributed Data Mining | p. 315 |
Common Secure Sub-protocols Used in Privacy-Preserving Distributed Data Mining | p. 318 |
Privacy-preserving Distributed Data Mining on Horizontally Partitioned Data | p. 323 |
Comparison to Vertically Partitioned Data Model | p. 326 |
Extension to Malicious Parties | p. 327 |
Limitations of the Cryptographic Techniques Used in Privacy-Preserving Distributed Data Mining | p. 329 |
Privacy Issues Related to Data Mining Results | p. 330 |
Conclusion | p. 332 |
References | p. 332 |
A Survey of Privacy-Preserving Methods Across Vertically Partitioned Data | p. 337 |
Introduction | p. 337 |
Classification | p. 341 |
Naïve Bayes Classification | p. 342 |
Bayesian Network Structure Learning | p. 343 |
Decision Tree Classification | p. 344 |
Clustering | p. 346 |
Association Rule Mining | p. 347 |
Outlier detection | p. 349 |
Algorithm | p. 351 |
Security Analysis | p. 352 |
Computation and Communication Analysis | p. 354 |
Challenges and Research Directions | p. 355 |
References | p. 356 |
A Survey of Attack Techniques on Privacy-Preserving Data Perturbation Methods | p. 359 |
Introduction | p. 360 |
Definitions and Notation | p. 360 |
Attacking Additive Data Perturbation | p. 361 |
Eigen-Analysis and PCA Preliminaries | p. 362 |
Spectral Filtering | p. 363 |
SVD Filtering | p. 364 |
PCA Filtering | p. 365 |
MAP Estimation Attack | p. 366 |
Distribution Analysis Attack | p. 367 |
Summary | p. 367 |
Attacking Matrix Multiplicative Data Perturbation | p. 369 |
Known I/O Attacks | p. 370 |
Known Sample Attack | p. 373 |
Other Attacks Based on ICA | p. 374 |
Summary | p. 375 |
Attacking k-Anonymization | p. 376 |
Conclusion 376 Acknowledgments 377 References | p. 377 |
Private Data Analysis via Output Perturbation | p. 383 |
Introduction | p. 383 |
The Abstract Model - Statistical Databases, Queries, and Sanitizers | p. 385 |
Privacy | p. 388 |
Interpreting the Privacy Definition | p. 390 |
The Basic Technique: Calibrating Noise to Sensitivity | p. 394 |
Applications: Functions with Low Global Sensitivity | p. 396 |
Constructing Sanitizers for Complex Functionalities | p. 400 |
k-Means Clustering | p. 401 |
SVD and PCA | p. 403 |
Learning in the Statistical Queries Model | p. 404 |
Beyond the Basics | p. 405 |
Instance Based Noise and Smooth Sensitivity | p. 406 |
The Sample-Aggregate Framework | p. 408 |
A General Sanitization Mechanism | p. 409 |
Related Work and Bibliographic Notes | p. 409 |
Acknowledgments | p. 411 |
References | p. 411 |
A Survey of Query Auditing Techniques for Data Privacy | p. 415 |
Introduction | p. 415 |
Auditing Aggregate Queries | p. 416 |
Offline Auditing | p. 417 |
Online Auditing | p. 418 |
Auditing Select-Project-Join Queries | p. 426 |
Challenges in Auditing | p. 427 |
Reading | p. 429 |
References | p. 430 |
Privacy and the Dimensionality Curse | p. 433 |
Introduction | p. 433 |
The Dimensionality Curse and the k-anonymity Method | p. 435 |
The Dimensionality Curse and Condensation | p. 441 |
The Dimensionality Curse and the Randomization Method | p. 446 |
Effects of Public Information | p. 446 |
Effects of High Dimensionality | p. 450 |
Gaussian Perturbing Distribution | p. 450 |
Uniform Perturbing Distribution | p. 455 |
The Dimensionality Curse and l-diversity | p. 458 |
Conclusions and Research Directions | p. 459 |
References | p. 460 |
Personalized Privacy Preservation | p. 461 |
Introduction | p. 461 |
Formalization of Personalized Anonymity | p. 463 |
Personal Privacy Requirements | p. 464 |
Generalization | p. 465 |
Combinatorial Process of Privacy Attack | p. 467 |
Primary Case | p. 468 |
Non-primary Case | p. 469 |
Theoretical Foundation | p. 470 |
Notations and Basic Properties | p. 471 |
Derivation of the Breach Probability | p. 472 |
Generalization Algorithm | p. 473 |
The Greedy Framework | p. 474 |
Optimal SA-generalization | p. 476 |
Alternative Forms of Personalized Privacy Preservation | p. 478 |
Extension of k-anonymity | p. 479 |
Personalization in Location Privacy Protection | p. 480 |
Summary and Future Work | p. 482 |
References | p. 485 |
Privacy-Preserving Data Stream Classification | p. 487 |
Introduction | p. 487 |
Motivating Example | p. 488 |
Contributions and Paper Outline | p. 490 |
Related Works | p. 491 |
Problem Statement | p. 493 |
Secure Join Stream Classification | p. 493 |
Naive Bayesian Classifiers | p. 494 |
Our Approach | p. 495 |
Initialization | p. 495 |
Bottom-Up Propagation | p. 496 |
Top-Down Propagation | p. 497 |
Using NBC | p. 499 |
Algorithm Analysis | p. 500 |
Empirical Studies | p. 501 |
Real-life Datasets | p. 502 |
Synthetic Datasets | p. 504 |
Discussion | p. 506 |
Conclusions | p. 507 |
References | p. 508 |
Index | p. 511 |
Table of Contents provided by Publisher. All Rights Reserved. |
An electronic version of this book is available through VitalSource.
This book is viewable on PC, Mac, iPhone, iPad, iPod Touch, and most smartphones.
By purchasing, you will be able to view this book online, as well as download it, for the chosen number of days.
Digital License
You are licensing a digital product for a set duration. Durations are set forth in the product description, with "Lifetime" typically meaning five (5) years of online access and permanent download to a supported device. All licenses are non-transferable.
More details can be found here.
A downloadable version of this book is available through the eCampus Reader or compatible Adobe readers.
Applications are available on iOS, Android, PC, Mac, and Windows Mobile platforms.
Please view the compatibility matrix prior to purchase.