数据描述
Please use version 2 (there were some issues with v1 that I fixed)!
New release of DAIGT train dataset! Improvement:
- new models: Cohere Command, Google Palm, GPT4 (from Radek!)
- new prompts, including source texts from the original essays!
- mapping of essay text to original prompt from persuade corpus
- filtering by the famous "RDizzl3_seven"
persuade_corpus 25996 chat_gpt_moth 2421 llama2_chat 2421 mistral7binstruct_v2 2421 mistral7binstruct_v1 2421 original_moth 2421 train_essays 1378 llama_70b_v1 1172 falcon_180b_v1 1055 darragh_claude_v7 1000 darragh_claude_v6 1000 radek_500 500 NousResearch/Llama-2-7b-chat-hf 400 mistralai/Mistral-7B-Instruct-v0.1 400 cohere-command 350 palm-text-bison1 349 radekgpt4 200
Sources (please upvote the original datasets!):
- Text generated with ChatGPT by MOTH (https://www.kaggle.com/datasets/alejopaullier/daigt-external-dataset)
- Persuade corpus contributed by Nicholas Broad (https://www.kaggle.com/datasets/nbroad/persaude-corpus-2/)
- Text generated with Llama-70b and Falcon180b by Nicholas Broad (https://www.kaggle.com/datasets/nbroad/daigt-data-llama-70b-and-falcon180b)
- Text generated with ChatGPT and GPT4 by Radek (https://www.kaggle.com/datasets/radek1/llm-generated-essays)
- 2000 Claude essays generated by @darraghdog (https://www.kaggle.com/datasets/darraghdog/hello-claude-1000-essays-from-anthropic)
- LLM-generated essay using PaLM from Google Gen-AI by @kingki19 (https://www.kaggle.com/datasets/kingki19/llm-generated-essay-using-palm-from-google-gen-ai)
- Official train essays
- Essays I generated with various LLMs
License: MIT for the data I generated. Check source datasets for the other sources mentioned above.
验证报告
以下为卖家选择提供的数据验证报告:

DAIGT V2 Train Dataset
97.19MB
申请报告