以下为卖家选择提供的数据验证报告:
数据描述
New release of DAIGT train dataset! Improvement:
Everything that was already in V3 dataset, plus a little bit of extra magic!
8000+ texts I generated with llama-based models finetuned on Persuade corpus 🔥🔥🔥
Sources (please upvote the original datasets!):
- Text generated with ChatGPT by MOTH (https://www.kaggle.com/datasets/alejopaullier/daigt-external-dataset)
- Persuade corpus contributed by Nicholas Broad (https://www.kaggle.com/datasets/nbroad/persaude-corpus-2/)
- Text generated with Llama-70b and Falcon180b by Nicholas Broad (https://www.kaggle.com/datasets/nbroad/daigt-data-llama-70b-and-falcon180b)
- Text generated with ChatGPT and GPT4 by Radek (https://www.kaggle.com/datasets/radek1/llm-generated-essays)
- 2000 Claude essays generated by @darraghdog (https://www.kaggle.com/datasets/darraghdog/hello-claude-1000-essays-from-anthropic)
- LLM-generated essay using PaLM from Google Gen-AI by @kingki19 (https://www.kaggle.com/datasets/kingki19/llm-generated-essay-using-palm-from-google-gen-ai)
- Official train essays
- Essays I generated with various LLMs
License: MIT for the data I generated. Check source datasets for the other sources mentioned above.

DAIGT-V4-TRAIN-DATASET
48.9MB
申请报告