以下为卖家选择提供的数据验证报告:
数据描述
Version 4: Adding the data from "LLM-generated essay using PaLM from Google Gen-AI" kindly generated by Kingki19 / Muhammad Rizqi. File:
**See also:** a new dataset of an additional 4900 LLM generated texts: **[LLM: Mistral-7B Instruct texts](https://www.kaggle.com/datasets/carlmcbrideellis/llm-mistral-7b-instruct-texts)**train_essays_RDizzl3_seven_v2.csv
Human texts:14247
LLM texts:3004
Version 3: "The RDizzl3 Seven" File:
train_essays_RDizzl3_seven_v1.csv
"
Car-free cities
""
Does the electoral college work?
""
Exploring Venus
""
The Face on Mars
""
Facial action coding system
""
A Cowboy Who Rode the Waves
""
Driverless cars
"
How this dataset was made: see the notebook "LLM: Make 7 prompt train dataset"
- Version 2: (
train_essays_7_prompts_v2.csv
) This dataset is composed of 13,712 human texts and 1638 AI-LLM generated texts originating from 7 of the PERSUADE 2.0 corpus prompts.
Namely:
- "
Car-free cities
" - "
Does the electoral college work?
" - "
Exploring Venus
" - "
The Face on Mars
" - "
Facial action coding system
" - "
Seeking multiple opinions
" - "
Phones and driving
"
This dataset is a derivative of the datasets
- LLM Generated Essays for the Detect AI Comp! by Radek Osmulski
- persuade corpus 2.0 provided by Nicholas Broad
- daigt data - llama 70b and falcon180b by Nicholas Broad
- Hello, Claude! 1000 essays from Anthropic... by Darragh
as well as the original competition training dataset
Version 1:This dataset is composed of 13,712 human texts and 1165 AI-LLM generated texts originating from 7 of the PERSUADE 2.0 corpus prompts.

LLM: 7 prompt training dataset
41.39MB
申请报告