Type of Culminating Activity

Graduate Student Project

Graduation Date

8-2012

Degree Title

Master of Science in Computer Science

Department

Computer Science

Major Advisor

Amit Jain

Abstract

While high-performance, cost-effective data management solutions, such as Hadoop, exist for Big Data analysis, small and medium businesses with moderate-sized data sets would also like to implement low budget data management systems that will perform well on existing data and scale as the amount of accumulated data increases. Parallel database management systems may provide a high-performance solution, but are expensive and complex to implement. The purpose of this project was to compare the scalability of open-source relational database management systems and distributed data management systems for small and medium data sets. To make this comparison, a business intelligence case study was investigated using three data management solutions: MySQL, Hadoop MapReduce, and Hive. This experiment involved a payment history analysis which considers customer, account, and transaction data for predictive analytics. Experiments were executed on data sets ranging from 200MB to 10GB. The results show that the single server MySQL solution performs best for trial sizes ranging from 200MB to 1GB, but does not scale well beyond that. MapReduce outperforms MySQL on data sets larger than 1GB and Hive outperforms MySQL on sets larger than 2GB. This demonstrates MapReduce and Hive as viable techniques for small and medium businesses who want to implement scalable data management techniques.

Recommended Citation

Hollingsworth, Marissa Rae, "Hadoop and Hive as Scalable Alternatives to RDBMS: A Case Study" (2012). Computer Science Graduate Projects and Theses. 2.
https://scholarworks.boisestate.edu/cs_gradproj/2

Download

Included in

Computer Sciences Commons

COinS

ScholarWorks

Computer Science Graduate Projects and Theses

Hadoop and Hive as Scalable Alternatives to RDBMS: A Case Study

Type of Culminating Activity

Graduation Date

Degree Title

Department

Major Advisor

Abstract

Recommended Citation

Included in

Browse

Links

Search

Author Corner

ScholarWorks

Computer Science Graduate Projects and Theses

Hadoop and Hive as Scalable Alternatives to RDBMS: A Case Study

Authors

Type of Culminating Activity

Graduation Date

Degree Title

Department

Major Advisor

Abstract

Recommended Citation

Included in

Share

Browse

Links

Search

Author Corner