以下为卖家选择提供的数据验证报告:
数据描述
[READ THIS FIRST! DATASETS FOR Academic/Learning/Non-commercial purpose]
Context
US Election 2020 is very interesting to look into as it is an election in the middle of a pandemic. Me and my teammate created a twitter crawler using Twitter API and Tweepy for my Artificial Intelligence coursework. We chose Donald Trump as a subject of interest as President Trump was known for his twitter interaction.
I decided to deploy my crawler on post-voting day to conduct a sentiment analysis.
Tweet text in this datasets is suitable for Sentiment Analysis usage.
Content
This raw datasets is crawled using Tweepy library and Twitter API. 2500 tweets were gathered per 15 minutes. There are total of 247,500 row of entries and 13 columns, with the total of 3,217,500 cells of data. Data cleaning is needed to perform before doing any analysis.
Datasets date range: 4th November 2020 - 11th November 2020 Tweets with "Trump", "DonalTrump", "realDonalTrump" were capture.
(The User = user of the particular row) username: Twitter User handle accDesc: Description of the user on profile location: Location of the tweet following: Total number of account the user is following followers: Total number of followers of the user totaltweets: Total tweets created of the user usercreated: Date of the user registered his/her Twitter account tweetcreated: Date of the tweet created favouritecount: tweet <3 count (equivalent to like on Facebook) retweetcount: Total tweet's retweet (equivalent to share on Facebook) text: Text body of the tweet tweetsource: Device used to create this tweet hashtags: hashtag of the tweet in JSON format
Acknowledgements and Disclaimers
Banner and thumbnail courtesy of > visuals < from unsplash.com
Much thanks to my teammate Jiacheng Loh and ChenZhen Li for the efforts.
Please do not use this datasets for any malicious attempts, any damage done is not under the responsible of me.
This datasets were gathered for the purpose of learning and not for commercial purposes.
Data were public in the public domain, therefore i assume these data is open for all.
Limitations
Datasets are gathered with at least 15 minutes interval, therefore datecreated distribution is not equal and may not include all tweets created within the date range.
