小情绪*

Emails for spam or ham classification (Trec 2005)

classificationbinary classificationemail and messaging

30

已售 0
606.74MB

数据标识:D17170403404691014

发布时间:2024/05/30

This dataset contains emails for spam or ham classification. It's from "2005 TREC Public Spam Corpus". There are three files:

  1. email_origin.csv: Original raw email with label.
    Columns:
  • label: Int type, 1 for spam and 0 for ham
  • origin: String type, original raw email
  1. email_text.csv: Processed email body with label.
    Columns:
  • label: Int type, 1 for spam and 0 for ham
  • text: String type, processed email body
  1. trec05p-1.tgz: Origin compressed file downloaded from source.

How I process email (from email_origin to email_text):

Email Processing

More dataset for spam or ham classification:

Emails for spam or ham classification (Trec 2007)

Emails for spam or ham classification (Trec 2006))

Emails for spam or ham classification (Enron 2006)

Emails for spam or ham classification SpamAssassin

Source:
https://plg.uwaterloo.ca/~gvcormac/treccorpus/about.html

看了又看

暂无推荐

验证报告

目前该文件尚无匹配的数据质量验证程序。我们将在后续版本中提供相应的验证支持,敬请谅解。

data icon
Emails for spam or ham classification (Trec 2005)
30
已售 0
606.74MB
申请报告