以下为卖家选择提供的数据验证报告:
数据描述
New release of DAIGT train dataset! New models: 'text-ada-001', 'text-babbage-001', 'text-curie-001', 'text-davinci-001', 'text-davinci-002', 'text-davinci-003'
These models from OpenAI are getting deprecated, so I made sure to generate some essays with them and share here. I also added following public datasets (please upvote!):
- https://www.kaggle.com/datasets/phanisrikanth/daigt-essays-from-intel-neural-chat-7b
- https://www.kaggle.com/datasets/carlmcbrideellis/llm-mistral-7b-instruct-texts
- https://www.kaggle.com/datasets/nbroad/daigt-data-llama-70b-and-falcon180b
- https://www.kaggle.com/datasets/snassimr/gpt4-rephrased-llm-daigt-dataset
All merged with my previous dataset for convenience (https://www.kaggle.com/datasets/thedrcat/daigt-v2-train-dataset)
Enjoy ❤️
Version 2 update:
- removed NaNs and duplicated/short generations
- applied cleaning prodedure from @nbroad's notebook - give it an upvote please!
- added
model
column to indicate model family used in generations

daigt-v3-train-dataset
82.67MB
申请报告