数据描述
Context
Personality classification from text is a well-known NLP task. However, the difficulty in annotating large datasets has restricted the scale at which most of these studies have been performed. In this dataset, we have sampled a subset of Twitter users who self-report their MBTI personality typees to curate a corpus of more than 1.5 million tweets of 8,328 users, and 36,466 edges depicting follower-followee relationships among them. This dataset can be utilized in a wide variety of Machine Learning research and development activities dealing with Twitter content.
Content
The dataset contains a set of 8,328 Twitter users who have reported their MBTI personality types in their profile descriptions.
Features in the Dataset
- Identity Features: ‘id’, ‘id_str’, ‘name’, ‘screen_name’, ‘location’, ‘description’, ‘verified’, ‘personality_type’ (target variable). These features uniquely describe a user.
- Behavior Features: ‘followers_count’, ‘friends_count’, ‘listed_count’, ‘favourites_count’, ‘statuses_count’, ‘number_of_quoted_statuses’, ‘number_of_retweeted_statuses’, ‘total_favorite_count’, ‘total_retweet_count’, 'total_hashtag_count', 'total_url_count', 'total_mentions_count', 'total_media_count', 'number_of_tweets_scraped', 'average_tweet_length', 'average_retweet_count', 'average_favorite_count', 'average_hashtag_count', 'average_url_count', 'average_mentions_count', 'average_media_count'. These features are social media platform-specific and describe how a user behaves on Twitter.
- Linguistic Features: A maximum of 200 most recent tweets, collected during January-March 2020. These features contain the linguistic content that characterizes a user’s writing style, thinking pattern, and emotional intelligence.
- Network Features: Ordered pairs of user IDs modeled as a directed unweighted graph, where an edge is directed from a follower node to a followee node.
Labels
| Personality type | Number of samples |
|---|---|
| INFJ | 917 |
| ISTP | 209 |
| ISTJ | 342 |
| INFP | 899 |
| ISFP | 232 |
| INTJ | 905 |
| INTP | 712 |
| ISFJ | 420 |
| ENFJ | 723 |
| ESTP | 147 |
| ESTJ | 221 |
| ENFP | 900 |
| ESFP | 202 |
| ENTJ | 677 |
| ENTP | 586 |
| ESFJ | 236 |
看了又看
验证报告
以下为卖家选择提供的数据验证报告:





