妎妎妎妎

verify-tagEmail Thread Summary Dataset

softwarenlpemail and messagingonline communitiestransformerssummarization

3

已售 0
23.58MB

数据标识:D17222213482769403

发布时间:2024/07/29

以下为卖家选择提供的数据验证报告:

数据描述

Email Thread Summary Dataset

Overview:

The Email Thread Dataset consists of two main files: email_thread_details and email_thread_summaries. These files collectively offer a comprehensive compilation of email thread information alongside human-generated summaries.

Email Thread Details:

Description:

The email_thread_details file provides a detailed perspective on individual email threads, encompassing crucial information such as subject, timestamp, sender, recipients, and the content of the email.

Columns:

  • thread_id: A unique identifier for each email thread.
  • subject: Subject of the email thread.
  • timestamp: Timestamp indicating when the message was sent.
  • from: Sender of the email.
  • to: List of recipients of the email.
  • body: Content of the email message.

Additional Information:

The "to" column is available in both CSV and Pickle (pkl) formats, facilitating convenient access to recipient information as a column of lists of strings.

Email Thread Summaries:

Description:

The email_thread_summaries file contains concise summaries crafted by human annotators for each email thread, offering a high-level overview of the content.

Columns:

  • thread_id: A unique identifier for each email thread.
  • summary: A concise summary of the email thread.

Dataset Structure:

The dataset is organized into threads and emails. There are a total of 4,167 threads and 21,684 emails, providing a rich source of information for analysis and research purposes.

  • Threads: 4,167 threads
  • Emails: 21,684 emails

Language:

  • Languages: English (en)

Use Cases:

  1. Natural Language Processing (NLP) Research:
    • Analyze email thread contents and human-generated summaries for advancements in NLP tasks.
  2. Text Summarization Models:
    • Train and evaluate text summarization models using the provided email threads and summaries.
  3. Email Analytics:
    • Gain insights into communication patterns, sender-receiver relationships, and content analysis.

File Formats:

  • CSV Files:
    • Easily importable into various data analysis tools.
  • Pickle (pkl) Files:
    • Facilitates direct reading of the "to" column as a column of lists of strings.
  • JSON Files:
    • Offers compatibility with JSON data structures, providing an additional option for users who prefer or require this widely-used format in their analytical workflows.

    • JSON File Features Description

      [     {         "thread_id": [unique identifier],         "subject": "[email thread subject]",         "timestamp": [timestamp in milliseconds],         "from": "[sender's name and identifier]",         "to": [             "[recipient 1]",             "[recipient 2]",             "[recipient 3]",             ...         ],         "body": "[email content]"     },     ... ] 
      [     {         "thread_id": [unique identifier],         "summary": "[summary content]"     },     ... ] 

Files Structure:

- Dataset   ├── CSV   │   ├── email_thread_details.csv   │   └── email_thread_summaries.csv   ├── Pickle   │   ├── email_thread_details.pkl   │   └── email_thread_summaries.pkl   └── JSON       ├── email_thread_details.json       └── email_thread_summaries.json 

License:

This dataset is provided under the MIT License.

Disclaimer:

The dataset has been anonymized and sanitized to ensure privacy and confidentiality.

data icon
Email Thread Summary Dataset
3
已售 0
23.58MB
申请报告