Hillary Clinton's Emails

老下头

Hillary Clinton's Emails

governmentnewsemail and messaging

￥27

已售 0

857.79MB

数据标识：D17174914957807317

发布时间：2024/06/04

Context

It's Datasets December this is my third dataset uploaded to Kaggle this month.

From Wikipedia:

"During her tenure as United States Secretary of State, Hillary Clinton drew controversy by using a private email server for official public communications rather than using official State Department email accounts maintained on secure federal servers. An FBI examination of Clinton's server found over 100 emails containing classified information, including 65 emails deemed "Secret" and 22 deemed "Top Secret". An additional 2,093 emails not marked classified were retroactively classified by the State Department."

Content

There's a good amount of email data here, though not as dense as the Enron dataset, it's much more numerous. Note that Clinton deleted a subset of these emails prior to turning them over to the State Department, so this isn't a perfect sample of someone's emails. A good bit is also redacted.

I have also included the raw PDF files from State that could be used for OCR training. There's also a CSV that maps the names in the database to real human names, since they're not always easy to tell.

Acknowledgements

Most of the hard work for this one was done by Martin Burch at the WSJ who created a series of Python scripts to download the data from the US State Department and upload them to a SQLite database.

Inspiration

This dataset is really cool and I think there's a lot of really interesting stuff that could be done with it. There are sections of the data that are redacted, and I don't know if the OCR done by the State Department properly marks what's redacted. Maybe someone could use deep learning to identify redacted data and to

看了又看

验证报告

以下为卖家选择提供的数据验证报告：

Hillary Clinton's Emails

￥27

已售 0

857.79MB

申请报告

Hillary Clinton's Emails

Context

Content

Acknowledgements

Inspiration

关于典枢

下载与支持

服务协议

关于我们

官方公众号

技术交流群