以下为卖家选择提供的数据验证报告:
数据描述
Context
This is was a Dataset Created as a part of the university Project On Sentimental Analysis On Multi-Source Social Media Platforms using PySpark.
There two datasets Respectively one Consists of Tweets from Twitter with Sentimental Label and the other from Reddit which Consists of Comments with its Sentimental Label.
- Twitter Dataset
2.Reddit Dataset
All these Tweets and Comments were extracted using there Respective Apis Tweepy and PRAW. These tweets and Comments Were Made on Narendra Modi and Other Leaders as well as Peoples Opinion Towards the Next Prime Minister of The Nation ( In Context with General Elections Held In India - 2019). All the Tweets and Comments From twitter and Reddit are Cleaned using Pythons re and also NLP with a Sentimental Label to each ranging from -1 to 1.
- 0 Indicating it is a Neutral Tweet/Comment 2.1 Indicating a Postive Sentiment 3.-1 Indicating a Negative Tweet/Comment
Content
Twitter.csv Dataset has around 163K Tweets along with Sentiment Labels. Reddit.csv Dataset has around 37K Comments along with its Sentimental Label So Generally Each Dataset has two columns, the first column has the cleaned tweets and Comments and the Second one indicates its Sentimental Label
Acknowledgements
This Dataset was Created with the help of my fellow teammates who passionately worked hard to gather more data with the help of the Tweepy and Reddit Apis. My Project Coordinator encouraged us to collect as much data as possible and he was the main motivation behind Implementing Sentimental Analysis on Multi-Source Social Media Platforms rather than a Single Platform Such as Twitter.
