悦 影

DAIGT Proper Train Dataset

educationlawtext

￥3

118.63MB

数据标识：D17220848248242548

发布时间：2024/07/27

Version 2 updated on 11/2/2023:

Since there is no proper train dataset for LLM - Detect AI Generated Text competition, I decided to create one.

Ingredients (please upvote the included datasets!):

Text generated with ChatGPT by MOTH (https://www.kaggle.com/datasets/alejopaullier/daigt-external-dataset)
Persuade corpus contributed by Nicholas Broad (https://www.kaggle.com/datasets/nbroad/persaude-corpus-2/)
Text generated with Llama-70b and Falcon180b by Nicholas Broad (https://www.kaggle.com/datasets/nbroad/daigt-data-llama-70b-and-falcon180b)
Text generated with ChatGPT by Radek (https://www.kaggle.com/datasets/radek1/llm-generated-essays)
Official train essays
Essays I generated with various LLMs

New version includes:

EssayID if available
Generation prompt if available
Random 10 fold split stratified by source dataset

Version 3 updated on 11/3/2023:

Additional 2400+ AI examples generated with Mistral 7B instruct and a new prompt (let's see how it works!)

Version 4 updated on 11/5/2023:

Additional 2000 Claude essays generated by @darraghdog (https://www.kaggle.com/datasets/darraghdog/hello-claude-1000-essays-from-anthropic)

看了又看

验证报告

以下为卖家选择提供的数据验证报告：

DAIGT Proper Train Dataset

￥3

118.63MB

申请报告