以下为卖家选择提供的数据验证报告:
数据描述
Introduction
Ever wondered what people are saying about certain countries? Whether it's in a positive/negative light? What are the most commonly used phrases/words to describe the country? In this dataset I present tweets where a certain country gets mentioned in the hashtags (e.g. #HongKong, #NewZealand). It contains around 150 countries in the world. I've added an additional field called polarity which has the sentiment computed from the text field. Feel free to explore! Feedback is much appreciated!
Content
Each row represents a tweet. Creation Dates of Tweets Range from 12/07/2020 to 25/07/2020. Will update on a Monthly cadence.
- The Country can be derived from the file_name field. (this field is very Tableau friendly when it comes to plotting maps)
- The Date at which the tweet was created can be got from created_at field.
- The Search Query used to query the Twitter Search Engine can be got from search_query field.
- The Tweet Full Text can be got from the text field.
- The Sentiment can be got from polarity field. (I've used the Vader Model from NLTK to compute this.)
Notes
There maybe slight duplications in tweet id's before 22/07/2020. I have since fixed this bug.
Acknowledgements
Thanks to the tweepy package for making the data extraction via Twitter API so easy.
Shameless Plug
Feel free to checkout my blog if you want to learn how I built the datalake via AWS or for other data shenanigans.
Here's an App I built using a live version of this data.
