This dataset contains emails for spam or ham classification. It's from "Enron-Spam datasets". This dataset contains 6 pre-processed(by author) form sets from Enron1 to Enron6, There are two files:
- email_origin.csv: Original pre-processed email with label.
Columns:
- label: Int type, 1 for spam and 0 for ham
- origin: String type, original pre-processed email
- email_text.csv: Processed(by me) email body with label.
Columns:
- label: Int type, 1 for spam and 0 for ham
- text: String type, processed email body
How I process email (from email_origin to email_text):
More dataset for spam or ham classification:
Emails for spam or ham classification (Trec 2007)
Emails for spam or ham classification (Trec 2006))
Emails for spam or ham classification (Trec 2005)
Emails for spam or ham classification SpamAssassin
Source:
http://nlp.cs.aueb.gr/software_and_datasets/Enron-Spam/index.html
看了又看
暂无推荐
验证报告

目前该文件尚无匹配的数据质量验证程序。我们将在后续版本中提供相应的验证支持,敬请谅解。

Emails for spam or ham classification (Enron 2006)
30.67MB
申请报告




