Misinformation, fake news & propaganda data set

A dataset containing 79k articles of misinformation, fake news and propaganda.

34975 'true' articles. --> MisinfoSuperset_TRUE.csv
43642 articles of misinfo, fake news or propaganda --> MisinfoSuperset_FAKE.csv

The 'true' articles comes from a variety of sources, such as Reuters, the New York TImes, the Washington Post and more.

The 'fake' articles are sourced from:

American right wing extremist websites (such as Redflag Newsdesk, Beitbart, Truth Broadcast Network)
A previously made public dataset described in the following article: Ahmed H, Traore I, Saad S. (2017) “Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques. In: Traore I., Woungang I., Awad A. (eds) Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments. ISDDC 2017. Lecture Notes in Computer Science, vol 10618. Springer, Cham (pp. 127-138).
Disinformation and propaganda cases collected by the EUvsDisinfo project. A project started in 2015 that identifies and fact checks disinformation cases originating from pro-Kremlin media that are spread across the EU.

The articles have all information except the actual text removed and are split up into a set with all the fake news / misinformation, and one with al the true articles.

// For those only interested in Russian propaganda (and not so much misinformation in general), I have added the Russian propaganda in a separate csv called 'EXTRA_RussianPropagandaSubset.csv..'

Note. While this might immediately seem like a great classification task, I would suggest also considering clustering / topic modelling. Why clustering? Because by clustering we make a model that can match a newly written article to a previously debunked lie / misinformation narrative, thereby we can immediately debunk a new article (or at least link it to a actual fact-checked statement) without either using an algorithm as argument , or encountering a time delay with regards to waiting for confirmation of a fact checking organisation.

An example disinformation project using this dataset can be found on https://stevenpeutz.com/disinformation/

Enjoy! You have chosen an incredibly important topic for your project!

看了又看

验证报告

以下为卖家选择提供的数据验证报告：

Misinformation & Fake News text dataset 79k

￥5

已售 0

84.58MB

申请报告

Misinformation & Fake News text dataset 79k

Misinformation, fake news & propaganda data set

Enjoy! You have chosen an incredibly important topic for your project!

关于典枢

下载与支持

服务协议

关于我们

官方公众号

技术交流群