悦 影

verify-tagPeruvian Food Reviews

arts and entertainmentnlpgeospatial analysisclassificationrestaurants

4

已售 0
83.6MB

数据标识:D17222478016518313

发布时间:2024/07/29

以下为卖家选择提供的数据验证报告:

数据描述

Pd: The banner image was obtained here. Credits to them.

Context

Peru is one of best culinary destination in the world. This country has diverse climates and ecological floors, where various crops have been developed. In this way, it has a lot of natural and unique inputs. So, peruvian food is a cuisine of opposites: hot and cold on the same plate. Acidic tastes melding with the starchy. Robust and delicate at the same time. This balance occurs because traditional Peruvian food relies on spices and bold flavors, ranging from the crisp and clean to the heavy and deep. Each flavor counters or tames the other. While many people see Peru as a land of cloud-topped mountains and ruins of ancient civilizations, Peru’s true treasure is its rich culinary heritage. Ingredients and cooking techniques from Africa, Europe, and East Asia come together in a delightful melange that is utterly unique the world over. But what kind of food do Peruvians eat? And, what restaurants should you visit? 😊

gg

Content

This kaggle dataset contains information scraped from GooglePlaces and Tripadvisor using Selenium, Requests, BeautifulSoup and Rvest. More info about the used web-scraping in this github repository The content here have a lot (at the moment not all) of restaurants reviews in Lima, Peru between 2010 and 2021. In total exist more of 8791 restaurants and more of 1160666 reviews. With a total of 20 features with a high diversity: geospatial, text, date ,categoric and numeric feafures!

This has two general sections. The first is the Restaurants. This contains general and geospatial information. The second is the Reviews. This contains the interaction between user and restaurant, with this way is possible to see the satisfaction of the client with a service. Exist a possible third section: the Users. This information maybe will be added in two months.

About the collection methodology, this is explained below:

-The sample: The scraped reviews are the most recent reviews in all possible restaurants in the province of Lima.

-Set of items: In one way, the users. In other way: the restaurants.

-Set of variables: Exist two general tables. See the information below

The following diagram and table summarise all.

Table 1: Restaurants

Variable Description
Id Id of the restaurant
Name Name of the restaurant
Tag The category of the restaurant
x, y Geospatial information and exact location of restaurant
District District where the restaurant is located
Direction District where the restaurant is located
Stars Mean Stars of restaurant in all time
N_reviews Number of reviews of restaurant in all time
Min_Price Minimum price in the menu of restaurant
Max_Price Maximum price in the menu of restaurant
Platform Platform where the information was downloaded

Table 2: Reviews

Variable Description
Id_review Id of the review
Id_nick Id of the user. With this is possible to get the profile link
Date Date when the review was written
Service Id of the restaurant. Conection with Table 1
Review Content of the review. This describe the satisfaction of the user
Title Title of the review. Only available in Tripadvisor
Score Punctuation in the review
Likes Number of votes in the publication
Platform Platform where the information was downloaded

Also, exist auxiliar information related with the sentiment and emotion. This probabilities was obtained with a Spanish NrcLexicon, however, that results is not ok. Anyway, that is a reference and you can propose a fine tuning here. In adittion, also exist the probability to get a specific star, however, this was obtained with a simple logistic regression. Also i showed the information about Spanish NrcLexicon and Geospatial Borders. The author ands more information you can find there and there.

Table 3: Models

Variable Description
Id_review Id of the review. Conection with Table 1
Positive Probability of review that it will show positive sentiment
Negative Probability of review that it will show negative sentiment
Anger Probability of review that it will show anger emotion
Anticipation Probability of review that it will show anticipation emotion
Disgust Probability of review that it will show disgust emotion
Fear Probability of review that it will show fear emotion
Joy Probability of review that it will show joy emotion
Sadness Probability of review that it will show sadness emotion
Surprise Probability of review that it will show surprise emotion
Stars_1 Probability of review that it will get 1 star
Stars_2 Probability of review that it will get 2 stars
Stars_3 Probability of review that it will get 3 stars
Stars_4 Probability of review that it will get 4 stars
Stars_5 Probability of review that it will get 5 stars

The entity relationship diagram!

erd

Usage

Text classification: The main topic in this types of datasets. Vectorize the reviews and define a predictive model. Identify strong and weak points of each restaurant.

Find patterns: Compare districts (or restaurants) along the time. What is the common words in an excellent restaurant? Why these restaurants are better?

Reduction of dimention: Detect similarities and then, clustering the reviews.

Acknowledgements

Thanks to Kaggle and its community. In general, thanks to the learners and teachers in machine learning, deep learning, natural language processing and computer vision.

Inspiration

Natural language processing is a great tool. One application that I'm interested is detect bullies messages in any social network. I know that exist many notebooks and papers, but I'd like to build a bot that detect all possible cases and surely, there exist!

data icon
Peruvian Food Reviews
4
已售 0
83.6MB
申请报告