筱雨

verify-tagAutomated Essay Scoring 2.0 - Llama 3 8B corrected

educationcomputer sciencenlp

6

已售 0
68.68MB

数据标识:D17219832816385672

发布时间:2024/07/26

以下为卖家选择提供的数据验证报告:

数据描述

This dataset contains the whole training dataset provided for the competition: Learning Agency Lab - Automated Essay Scoring 2.0. Augmented with a "cleaned_text" column. The "cleaned_text" column corresponds to a rewritten text by llama 3 8B correcting the text for punctuation and miss spelling. It was obtained using the following prompt:

""" Rewrite the exact same text provided by the user correcting only punctuation, capitalisation, typo or spelling mistakes. Output only the text and nothing else. """ 

It has shown to improve heuristic readability metrics by improving tokenization split from traditional tokenizer like spacy or nltk.

Here are additional parameters used for the generation:

temperature=0, max_tokens=4096, top_p=1, 

I used groq and their python SDK to generate the dataset.

data icon
Automated Essay Scoring 2.0 - Llama 3 8B corrected
6
已售 0
68.68MB
申请报告