Completeness Integrity Protection for Outsourced Databases Using Semantic Fake Data

Faculty Mentor Information

Dr. Jyh-haw Yeh

Abstract

As cloud storage and computing gains popularity, data entrusted to the cloud has the potential to be exposed to more people and thus more vulnerable to attacks. It is important to develop mechanisms to protect data privacy and integrity so that clients can safely outsource their data to cloud. We present a method for ensuring data completeness which is one facet of the data integrity problem. Our approach is to convert a standard database to a Completeness Protected Database (CPDB) by injecting some semantic fake data and then outsource it to the cloud. These fake data are initially generated by a pseudo-random but deterministic function so that the data owner is able to regenerate the fake data and match them to fake data returned from a range query to check for completeness. The CPDB is innovative in the following ways: (1) fake data is randomly generated but is semantically indistinguishable from other existing data; (2) since fake data is generated by deterministic functions, data owners do not need to remember what fake data have been injected, instead they can re-generate fake data using the functions; (3) there is no costly cryptographic encryption/signature used in our scheme.

This document is currently not available here.

Share

COinS
 

Completeness Integrity Protection for Outsourced Databases Using Semantic Fake Data

As cloud storage and computing gains popularity, data entrusted to the cloud has the potential to be exposed to more people and thus more vulnerable to attacks. It is important to develop mechanisms to protect data privacy and integrity so that clients can safely outsource their data to cloud. We present a method for ensuring data completeness which is one facet of the data integrity problem. Our approach is to convert a standard database to a Completeness Protected Database (CPDB) by injecting some semantic fake data and then outsource it to the cloud. These fake data are initially generated by a pseudo-random but deterministic function so that the data owner is able to regenerate the fake data and match them to fake data returned from a range query to check for completeness. The CPDB is innovative in the following ways: (1) fake data is randomly generated but is semantically indistinguishable from other existing data; (2) since fake data is generated by deterministic functions, data owners do not need to remember what fake data have been injected, instead they can re-generate fake data using the functions; (3) there is no costly cryptographic encryption/signature used in our scheme.