以下为卖家选择提供的数据验证报告:
数据描述
Email Thread Summary Dataset
Overview:
The Email Thread Dataset consists of two main files: email_thread_details
and email_thread_summaries
. These files collectively offer a comprehensive compilation of email thread information alongside human-generated summaries.
Email Thread Details:
Description:
The email_thread_details file provides a detailed perspective on individual email threads, encompassing crucial information such as subject, timestamp, sender, recipients, and the content of the email.
Columns:
thread_id
: A unique identifier for each email thread.subject
: Subject of the email thread.timestamp
: Timestamp indicating when the message was sent.from
: Sender of the email.to
: List of recipients of the email.body
: Content of the email message.
Additional Information:
The "to
" column is available in both CSV and Pickle (pkl) formats, facilitating convenient access to recipient information as a column of lists of strings.
Email Thread Summaries:
Description:
The email_thread_summaries file contains concise summaries crafted by human annotators for each email thread, offering a high-level overview of the content.
Columns:
thread_id
: A unique identifier for each email thread.summary
: A concise summary of the email thread.
Dataset Structure:
The dataset is organized into threads and emails. There are a total of 4,167 threads and 21,684 emails, providing a rich source of information for analysis and research purposes.
- Threads: 4,167 threads
- Emails: 21,684 emails
Language:
- Languages: English (en)
Use Cases:
- Natural Language Processing (NLP) Research:
- Analyze email thread contents and human-generated summaries for advancements in NLP tasks.
- Text Summarization Models:
- Train and evaluate text summarization models using the provided email threads and summaries.
- Email Analytics:
- Gain insights into communication patterns, sender-receiver relationships, and content analysis.
File Formats:
- CSV Files:
- Easily importable into various data analysis tools.
- Pickle (pkl) Files:
- Facilitates direct reading of the "to" column as a column of lists of strings.
- JSON Files:
Offers compatibility with JSON data structures, providing an additional option for users who prefer or require this widely-used format in their analytical workflows.
JSON File Features Description
[ { "thread_id": [unique identifier], "subject": "[email thread subject]", "timestamp": [timestamp in milliseconds], "from": "[sender's name and identifier]", "to": [ "[recipient 1]", "[recipient 2]", "[recipient 3]", ... ], "body": "[email content]" }, ... ]
[ { "thread_id": [unique identifier], "summary": "[summary content]" }, ... ]
Files Structure:
- Dataset ├── CSV │ ├── email_thread_details.csv │ └── email_thread_summaries.csv ├── Pickle │ ├── email_thread_details.pkl │ └── email_thread_summaries.pkl └── JSON ├── email_thread_details.json └── email_thread_summaries.json
License:
This dataset is provided under the MIT License.
Disclaimer:
The dataset has been anonymized and sanitized to ensure privacy and confidentiality.
