悦 影

Spam Email Classification Dataset

computer scienceclassificationtextbinary classificationemail and messaging

￥2

133.43MB

数据标识：D17220404589752050

发布时间：2024/07/27

Introduction

This is a csv file containing 83446 records of email which are labelled as either spam or not-spam. It is formed by combining the 2007 TREC Public Spam Corpus and Enron-Spam Dataset.

Columns

label
- '1' indicates that the email is classified as spam.
- '0' denotes that the email is legitimate (ham).
text
- This column contains the actual content of the email messages.

Sources

2007 TREC Public Spam Corpus
- Original link: https://plg.uwaterloo.ca/~gvcormac/treccorpus07/
- Preprocessed download link: https://www.kaggle.com/datasets/bayes2003/emails-for-spam-or-ham-classification-trec-2007
Enron-Spam Dataset
- Original link: https://www2.aueb.gr/users/ion/data/enron-spam/
- Preprocessed download link: https://github.com/MWiechmann/enron_spam_data/

Code for combining and processing the two datasets: https://github.com/PuruSinghvi/Spam-Email-Classifier/blob/main/Combining%20Datasets.ipynb

Spam Email Classifier

A spam email classifier has been trained and built using this dataset. It can be found here: https://github.com/PuruSinghvi/Spam-Email-Classifier

看了又看

验证报告

以下为卖家选择提供的数据验证报告：

Spam Email Classification Dataset

￥2

133.43MB

申请报告

Spam Email Classification Dataset

Introduction

Columns

Sources

Spam Email Classifier

关于典枢

下载与支持

服务协议

关于我们

官方公众号

技术交流群