Type of Culminating Activity

Graduate Student Project

Graduation Date

5-2016

Degree Title

Master of Science in Computer Science

Department

Computer Science

Major Advisor

Amit Jain

Abstract

There are many small and medium businesses with mid sized data sets that would like to implement low budget data management systems that will perform well with their existing budget and scale as more data is accumulated. One solution is to choose one of the many high-performing and cost effective Big Data management systems such as Hive and Phoenix. Another option is to use parallel database management systems which are high-performance alternatives but are expensive and can be complicated to implement. The purpose of this project was to compare Hive and Phoenix with MySQL to see if either are viable alternatives to relational database management systems for realtime data retrieval. The case study involved two complex stored procedures given by a local company, iVinci Health, and three simulated data sets with sizes ranging from 864.08 MB to 3.83 GB. The stored procedures take user input, generate and execute a complex query and then return the results. A web application was created to simulate how the data will be accessed in the real application. The results show that for this case study, MySQL outperforms both Phoenix and Hive. However, Hive will outperform MySQL as the data sets increase significantly in size.

Share

COinS