Spam Email Data original & CSV file (Spamassassin)

王七七

Spam Email Data original & CSV file (Spamassassin)

intermediateclassificationrandom forestsvmemail and messaging

￥14

已售 0

12.28MB

数据标识：D17171553333472283

发布时间：2024/05/31

数据描述

Overview

There are three files here: the spam folder is the original spam file; the ham folder is the original non-spam email; spam_ham_data is the CSV file I obtained after processing the previous two raw data, which can be used directly for further feature engineering and model training. Also, I have attached the code that handles these email files, please have a look at code.

Please check my notebook, which shows you how to convert raw Spamassassin files to CSV files.

Hope it helps you understand how the CSV file was created.

Welcome

Welcome to the home page for the open-source Apache SpamAssassin Project.

Apache SpamAssassin is the #1 Open Source anti-spam platform giving system administrators a filter to classify email and block spam (unsolicited bulk email).

You can click https://spamassassin.apache.org/old/publiccorpus/ to check the original email data.

> I use the 2003_easy_ham, 2003_hard_ham, and 2003_spam. (I merge 2003_easy_ham, and 2003_hard_ham into a single folder ham.)

Features or Columns

Email: the original data read from the original files. Use it to generate more features!
Label: 0 means ham, 1 means spam.
Subject: the subject of an email.
Content: the main body of an email.

Modeling

As I said before, you can use CSV files directly for modeling. I have demonstrated various machine-learning modeling processes for you in the code, and already got some relatively good results. Of course, you can get better cross-validation scores based on my baseline. (Actually, I haven't done any fine-tuning yet, you can definitely get better scores than mine.)

验证报告

以下为卖家选择提供的数据验证报告：

Spam Email Data original & CSV file (Spamassassin)

￥14

已售 0

12.28MB

申请报告

Spam Email Data original & CSV file (Spamassassin)

Overview

Please check my notebook, which shows you how to convert raw Spamassassin files to CSV files.

Welcome

Features or Columns

Modeling

关于典枢

下载与支持

服务协议

关于我们

官方公众号

技术交流群