悠悠

verify-tagMisinformation & Fake News text dataset 79k

politicsclassificationtextnewssocial networks

5

已售 0
84.58MB

数据标识:D17222435136938636

发布时间:2024/07/29

以下为卖家选择提供的数据验证报告:

数据描述

Misinformation, fake news & propaganda data set

A dataset containing 79k articles of misinformation, fake news and propaganda.

  • 34975 'true' articles. --> MisinfoSuperset_TRUE.csv
  • 43642 articles of misinfo, fake news or propaganda --> MisinfoSuperset_FAKE.csv

The 'true' articles comes from a variety of sources, such as Reuters, the New York TImes, the Washington Post and more.

The 'fake' articles are sourced from:

  1. American right wing extremist websites (such as Redflag Newsdesk, Beitbart, Truth Broadcast Network)
  2. A previously made public dataset described in the following article: Ahmed H, Traore I, Saad S. (2017) “Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques. In: Traore I., Woungang I., Awad A. (eds) Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments. ISDDC 2017. Lecture Notes in Computer Science, vol 10618. Springer, Cham (pp. 127-138).
  3. Disinformation and propaganda cases collected by the EUvsDisinfo project. A project started in 2015 that identifies and fact checks disinformation cases originating from pro-Kremlin media that are spread across the EU.

The articles have all information except the actual text removed and are split up into a set with all the fake news / misinformation, and one with al the true articles.

// For those only interested in Russian propaganda (and not so much misinformation in general), I have added the Russian propaganda in a separate csv called 'EXTRA_RussianPropagandaSubset.csv..'

--

Note. While this might immediately seem like a great classification task, I would suggest also considering clustering / topic modelling. Why clustering? Because by clustering we make a model that can match a newly written article to a previously debunked lie / misinformation narrative, thereby we can immediately debunk a new article (or at least link it to a actual fact-checked statement) without either using an algorithm as argument , or encountering a time delay with regards to waiting for confirmation of a fact checking organisation.

An example disinformation project using this dataset can be found on https://stevenpeutz.com/disinformation/

Enjoy! You have chosen an incredibly important topic for your project!

data icon
Misinformation & Fake News text dataset 79k
5
已售 0
84.58MB
申请报告