以下为卖家选择提供的数据验证报告:
数据描述
Version 2 updated on 11/2/2023:
Since there is no proper train dataset for LLM - Detect AI Generated Text competition, I decided to create one.
Ingredients (please upvote the included datasets!):
- Text generated with ChatGPT by MOTH (https://www.kaggle.com/datasets/alejopaullier/daigt-external-dataset)
- Persuade corpus contributed by Nicholas Broad (https://www.kaggle.com/datasets/nbroad/persaude-corpus-2/)
- Text generated with Llama-70b and Falcon180b by Nicholas Broad (https://www.kaggle.com/datasets/nbroad/daigt-data-llama-70b-and-falcon180b)
- Text generated with ChatGPT by Radek (https://www.kaggle.com/datasets/radek1/llm-generated-essays)
- Official train essays
- Essays I generated with various LLMs
New version includes:
- EssayID if available
- Generation prompt if available
- Random 10 fold split stratified by source dataset
Version 3 updated on 11/3/2023:
- Additional 2400+ AI examples generated with Mistral 7B instruct and a new prompt (let's see how it works!)
Version 4 updated on 11/5/2023:
- Additional 2000 Claude essays generated by @darraghdog (https://www.kaggle.com/datasets/darraghdog/hello-claude-1000-essays-from-anthropic)

DAIGT Proper Train Dataset
118.63MB
申请报告