Publication Date
12-2017
Date of Final Oral Examination (Defense)
10-13-2017
Type of Culminating Activity
Thesis
Degree Title
Master of Science in Computer Science
Department
Computer Science
Supervisory Committee Chair
Edoardo Serra, Ph.D
Supervisory Committee Member
Timothy Andersen, Ph.D
Supervisory Committee Member
Casey Kennington, Ph.D
Abstract
These days with TV-shows and starred chefs, new kinds of cuisines appear in the market. The main cuisines like French, Italian, Japanese, Chinese and Indian are always appreciated but they are no longer the most popular. The new trend is the fusion cuisine, which is obtained by combining different main cuisines. The opening of a new restaurant proposing new kinds of cuisine produces a lot of excitement in people. They feel the need to try it and be part of this new culture. Yelp is a platform which publishes crowd sourced reviews about different businesses, in particular, restaurants. For some restaurants in Yelp if the kind of cuisine is available, usually, there is a tag only for the main cuisines, but there is no information for the fusion cuisine. There is a need to develop a system which is able to identify restaurants proposing fusion cuisine (novel or unknown cuisines).
This proposal is to address the novelty detection task using Yelp reviews. The idea is that the semi-supervised Machine Learning models trained only on the reviews of restaurants proposing the main cuisine will be able to discriminate between restaurants providing the main cuisine and restaurants providing the novel ones.
We propose effective novelty detection approaches for the unknown cuisine type identification problem using Long Short Term Memory (LSTM), autoencoder and Term-Frequency and Inverse Document Frequency(). Our main idea is to obtain features from LSTM, autoencoder and TF-IDF and use these features with standard semi-supervised novelty detection algorithms like Gaussian Mixture Model, Isolation Forest and One-class Support Vector Machines (SVM) to identify the unknown cuisines.
We conducted extensive experiments that prove the effectiveness of our approaches. The score that we obtained has a very high discrimination power because the best value of AUROC for the novelty detection problem is 0.85 from LSTM. LSTM outperforms our baseline model of TF-IDF and the main motivation is due to its ability to retain only the useful parts of a sentence.
DOI
https://doi.org/10.18122/B25X3M
Recommended Citation
Akella, Haritha, "Identifying Restaurants Proposing Novel Kinds of Cuisines: Using Yelp Reviews" (2017). Boise State University Theses and Dissertations. 1331.
https://doi.org/10.18122/B25X3M