数据描述
This dataset contains emails for spam or ham classification. It's from "2006 TREC Public Spam Corpora". There are three files:
- email_origin.csv: Original raw email with label. Columns:
- label: Int type, 1 for spam and 0 for ham
- origin: String type, original raw email
- email_text.csv: Processed email body with label. Columns:
- label: Int type, 1 for spam and 0 for ham
- text: String type, processed email body
- trec06p.tgz: Origin compressed file downloaded from source.
How I process email (from email_origin to email_text):
More dataset for spam or ham classification:
Emails for spam or ham classification (Trec 2007)
Emails for spam or ham classification (Trec 2005)
Emails for spam or ham classification (Enron 2006)
Emails for spam or ham classification SpamAssassin
Source: https://plg.uwaterloo.ca/~gvcormac/treccorpus06/about.html
验证报告
以下为卖家选择提供的数据验证报告:

Emails for spam or ham classification (Trec 2006)
146.5MB
申请报告