Restaurant Rating System
Evaluating Restaurants by Extracting and Categorizing Information from Reviews
-
Course: Natural Language Processing, Computer Science Department, New York University
Date: Apr. 2017 - May 2017
Online restaurant reviews can often be verbose or inaccurate, making it difficult for readers to find relevant information or determine whether a restaurant matches their preferences. To address this, we propose a restaurant rating system that leverages Natural Language Processing (NLP) techniques to classify lengthy reviews into key categories such as taste, service, price, sanitation, and location. This approach enables customers to gain a clearer and more comprehensive understanding of a restaurant from multiple perspectives. Additionally, our system provides restaurant owners with more objective and actionable insights, fostering fairer and more constructive feedback.
Our system achieved an average tagging accuracy of 97.08% in classifying reviews from three restaurants into seven self-defined categories. The pipeline comprises three core components:
- Data Collection and Labeling: Scrape review data from Yelp into a .csv file, parse it into a .txt file, and label the reviews.
- Feature Engineering and Model Training: Develop effective feature sets and train the model.
- Testing and Rating Calculation: Test the model on additional reviews and compute ratings for each review.
- Overall: LOVE/HATE
- Taste: YUMMY/GROSS
- Service: FRIENDLY/INHOSPITABLE
- Price: CHEAP/EXPENSIVE
- Sanitation: CLEAN/DIRTY
- Location: CONVENIENT/INCONVENIENT
- Other: OTHER
Feature Engineering
During feature extraction, we identified five key features to optimize classification performance. These include:- Sequence Part-of-Speech (POS) Features
- POS Combinations (e.g., NN+VB+JJ)
- Token Word Appearance
- Special Lists: Synonyms relevant to each category from Thesaurus.com
- Special Sentence Structures: Common patterns in reviews
Model Training
We trained the system using the Maximum-Entropy Markov Model (MEMM) implemented in the OpenNLP library.Results and Insights
Our system produced category ratings—Overall, Taste, Service, Price, Sanitation, and Location—that closely matched Yelp’s aggregate ratings. This demonstrates the system’s potential to help business owners gain more detailed insights from Yelp reviews.Future Enhancements
To further improve accuracy and reliability, we plan to pursue three major enhancements:- Incorporate Sentiment Analysis: Identify words most likely to convey positive or negative sentiment.
- Enhance the Dictionary: Expand category-specific synonym lists and account for different POS variations.
- Increase Training Data: Label more reviews to improve model performance through larger datasets.
Life is really simple, but we insist on making it complicated. - Confucius