晓彤

verify-tagTwitter MBTI Personality Types Dataset

psychologymulticlass classificationsocial networks

5

已售 0
73.15MB

数据标识:D17222556276310237

发布时间:2024/07/29

以下为卖家选择提供的数据验证报告:

数据描述

Context

Personality classification from text is a well-known NLP task. However, the difficulty in annotating large datasets has restricted the scale at which most of these studies have been performed. In this dataset, we have sampled a subset of Twitter users who self-report their MBTI personality typees to curate a corpus of more than 1.5 million tweets of 8,328 users, and 36,466 edges depicting follower-followee relationships among them. This dataset can be utilized in a wide variety of Machine Learning research and development activities dealing with Twitter content.

Content

The dataset contains a set of 8,328 Twitter users who have reported their MBTI personality types in their profile descriptions.

Features in the Dataset

  • Identity Features: ‘id’, ‘id_str’, ‘name’, ‘screen_name’, ‘location’, ‘description’, ‘verified’, ‘personality_type’ (target variable). These features uniquely describe a user.
  • Behavior Features: ‘followers_count’, ‘friends_count’, ‘listed_count’, ‘favourites_count’, ‘statuses_count’, ‘number_of_quoted_statuses’, ‘number_of_retweeted_statuses’, ‘total_favorite_count’, ‘total_retweet_count’, 'total_hashtag_count', 'total_url_count', 'total_mentions_count', 'total_media_count', 'number_of_tweets_scraped', 'average_tweet_length', 'average_retweet_count', 'average_favorite_count', 'average_hashtag_count', 'average_url_count', 'average_mentions_count', 'average_media_count'. These features are social media platform-specific and describe how a user behaves on Twitter.
  • Linguistic Features: A maximum of 200 most recent tweets, collected during January-March 2020. These features contain the linguistic content that characterizes a user’s writing style, thinking pattern, and emotional intelligence.
  • Network Features: Ordered pairs of user IDs modeled as a directed unweighted graph, where an edge is directed from a follower node to a followee node.

Labels

Personality type Number of samples
INFJ 917
ISTP 209
ISTJ 342
INFP 899
ISFP 232
INTJ 905
INTP 712
ISFJ 420
ENFJ 723
ESTP 147
ESTJ 221
ENFP 900
ESFP 202
ENTJ 677
ENTP 586
ESFJ 236
data icon
Twitter MBTI Personality Types Dataset
5
已售 0
73.15MB
申请报告