A Crowdsourcing Semi-Supervised LSTM Training Approach to Identify Novel Items in Emerging Artificial Intelligent Environments

Document Type

Conference Proceeding

Publication Date



Nowadays always new kinds of cuisines appear on the market. Even though main cuisines such as French, Italian, Japanese, Chinese and Indian are always appreciated, they are not anymore the most popular. The new trend is fusion cuisine. A fusion cuisine is a combination of different main cuisines, this combination makes this cuisine new. The opening of a new restaurant proposing a new kind of cuisine produces a lot of excitement and people feel the need to try it and be part of this new culture. Yelp is a platform which publishes crowd-sourced reviews about different businesses, in particular, restaurants. Yelp allows the possibility to declare for each restaurant the kind of cuisine. Unfortunately, since the restaurants in the Yelp database are not often generated by the owners but by the users creating the reviews, there is no much information about the kind of cuisine, especially for restaurants providing fusion ones.

In this paper, we address the problem of identifying restaurants proposing new kinds of cuisines by using their Yelp reviews. These new cuisines can be completely new or fusion cuisines. Discriminating between main cuisines and fusion cuisines is very difficult because fusion cuisines are similar to the main ones even if they are conceptually different. We propose 4Phase, a semi-supervised procedure that trains Long Short-Term Memory with only the text reviews of the restaurants providing main cuisines. The trained LSTM is ultimately used as a feature generator in combination with a standard novelty detection model (e.g., Gaussian Mixture Models). We perform experiments on Yelp to separate restaurants providing main cuisines from the ones providing completely new cuisines or fusion ones. In this experiments, our 4Phase procedure outperforms all the baselines (term frequency, Doc2Vec, autoencoder LSTM, etc.) and reaches 0.91 of both AUROC and MAP.