一七

us-senate-analysis

Exploratory Data AnaPoliticsGovernmentUnited StatesMulticlass ClassificFeature Extraction

2

已售 0
22.15MB

数据标识:D17168946973421812

发布时间:2024/05/28

卖家暂未授权典枢平台对该文件进行数据验证,您可以向卖家

申请验证报告

数据描述

About Dataset

The dataset contains 4 tables

  • senators.csv: Data extracted from https://en.wikipedia.org/wiki/List_of_current_United_States_senators and mixed with data from Twitter API. Contains the names of the senators, age, occupation, location and a pair of coordinates of their state for geospatial representation of the network. It also includes general Twitter data: screen_name, followers, follows, id…
    UPDATE: we added the gender and race of the senators for social purposes and social representation. Also, it can serve as a dataset for fairness study of models

  • relationships.csv contains the relationship of all possible combinations of senators with the three columns: person1, person2, relationship, being this last: following (mutual following), not following (none of them following) or following only by personx (x = 1 or 2)

  • dataset_with_topics.csv gives a list of 250k tweets of us senators from 2008 to 2023. IT IS NOT A COMPLETE COLLECTION. It also contains the results of three BERTopic clusterings with different minimum cluster size.

  • topic_info_150 gives the information of the topics with min cluster size 150, and a manual labeling of the same, you can join this table with the previous to perform tasks like multi class classification.

Note that the four tables are completely related and you can merge them all.

data icon
us-senate-analysis
2
已售 0
22.15MB
申请报告