Publication Date

12-2017

Date of Final Oral Examination (Defense)

10-17-2017

Type of Culminating Activity

Thesis

Degree Title

Master of Science in Computer Science

Department

Computer Science

Supervisory Committee Chair

Gaby Dagher, Ph.D.

Supervisory Committee Member

Dianxiang Xu, Ph.D.

Supervisory Committee Member

Yantian Hou, Ph.D.

Abstract

Over the past decade, the collection of data by individuals, businesses and government agencies has increased tremendously. Due to the widespread of mobile computing and the advances in location-acquisition techniques, an immense amount of data concerning the mobility of moving objects have been generated. The movement data of an object (e.g. individual) might include specific information about the locations it visited, the time those locations were visited, or both. While it is beneficial to share data for the purpose of mining and analysis, data sharing might risk the privacy of the individuals involved in the data. Privacy-Preserving Data Publishing (PPDP) provides techniques that utilize several privacy models for the purpose of publishing useful information while preserving data privacy.

The objective of this thesis is to answer the following question: How can a data owner publish trajectory data while simultaneously safeguarding the privacy of the data and maintaining its usefulness? We propose an algorithm for anonymizing and publishing trajectory data that ensures the output is differentially private while maintaining high utility and scalability. Our solution comprises a twofold approach. First, we generalize trajectories by generalizing and then partitioning the timestamps at each location in a differentially private manner. Next, we add noise to the real count of the generalized trajectories according to the given privacy budget to enforce differential privacy. As a result, our approach achieves an overall epsilon-differential privacy on the output trajectory data. We perform experimental evaluation on real-life data, and demonstrate that our proposed approach can effectively answer count and range queries, as well as mining frequent sequential patterns. We also show that our algorithm is efficient w.r.t. privacy budget and number of partitions, and also scalable with increasing data size.

DOI

https://doi.org/10.18122/B2FD7F

Share

COinS