以下为卖家选择提供的数据验证报告:
数据描述
Context
Personality classification from text is a well-known NLP task. However, the difficulty in annotating large datasets has restricted the scale at which most of these studies have been performed. In this dataset, we have sampled a subset of Twitter users who self-report their MBTI personality typees to curate a corpus of more than 1.5 million tweets of 8,328 users, and 36,466 edges depicting follower-followee relationships among them. This dataset can be utilized in a wide variety of Machine Learning research and development activities dealing with Twitter content.
Content
The dataset contains a set of 8,328 Twitter users who have reported their MBTI personality types in their profile descriptions.
Features in the Dataset
- Identity Features: ‘id’, ‘id_str’, ‘name’, ‘screen_name’, ‘location’, ‘description’, ‘verified’, ‘personality_type’ (target variable). These features uniquely describe a user.
- Behavior Features: ‘followers_count’, ‘friends_count’, ‘listed_count’, ‘favourites_count’, ‘statuses_count’, ‘number_of_quoted_statuses’, ‘number_of_retweeted_statuses’, ‘total_favorite_count’, ‘total_retweet_count’, 'total_hashtag_count', 'total_url_count', 'total_mentions_count', 'total_media_count', 'number_of_tweets_scraped', 'average_tweet_length', 'average_retweet_count', 'average_favorite_count', 'average_hashtag_count', 'average_url_count', 'average_mentions_count', 'average_media_count'. These features are social media platform-specific and describe how a user behaves on Twitter.
- Linguistic Features: A maximum of 200 most recent tweets, collected during January-March 2020. These features contain the linguistic content that characterizes a user’s writing style, thinking pattern, and emotional intelligence.
- Network Features: Ordered pairs of user IDs modeled as a directed unweighted graph, where an edge is directed from a follower node to a followee node.
Labels
Personality type | Number of samples |
---|---|
INFJ | 917 |
ISTP | 209 |
ISTJ | 342 |
INFP | 899 |
ISFP | 232 |
INTJ | 905 |
INTP | 712 |
ISFJ | 420 |
ENFJ | 723 |
ESTP | 147 |
ESTJ | 221 |
ENFP | 900 |
ESFP | 202 |
ENTJ | 677 |
ENTP | 586 |
ESFJ | 236 |
