以下为卖家选择提供的数据验证报告:
数据描述
The PERSUADE 2.0 corpus builds on the PERSUADE 1.0 corpus by providing holistic essay scores to each persuasive essay in the PERSUADE 1.0 corpus as well as proficiency scores for each argumentative and discourse element found in the initial corpus. This version also contains all essays (as compared to 1.0 which linked the training set for the Kaggle competition)
In total, the PERSUADE 2.0 corpus comprises over 25,000 argumentative essays produced by 6th-12th grade students in the United States for 15 prompts on two writing tasks: independent and source-based writing. The PERSUADE 2.0 corpus provides detailed individual and demographic information for each writer as well as the initial annotations for argumentative and discourse element found PERSUADE 1.0.
V2: Added sources in sources.csv
. Links to full text and gpt4 summaries provided.
LICENSE UNKNOWN Data pulled from here
persuade_2.0_human_scores_demo_id_github.csv
has full texts, holistic score, word count, prompt, task, source texts, gender, grade level, english language learner status, and race/ethnicity, economic status, disability status.
persuade_corpus_1.0.csv
has the following columns:
essay_id_comp: The essay ID
competition_set: Whether the essay was part of the training or the test set in the Feedback Prize
full_text: The full text of the essay
discourse_id: ID for the discourse element
discourse_start: Character position in the essay where the discourse element starts
discourse_end discourse_text: Character position in the essay where the discourse element ends
discourse_type: Human annotation for the discourse element
discourse_type_num: Number for discourse element in essay
Each essay in the PERSUADE corpus was human annotated for argumentative and discourse elements as well as relationships between argumentative elements. The corpus was annotated using a double-blind rating process with 100 percent adjudication such that each essay was independently reviewed by two expert raters and adjudicated by a third expert rater.
The annotation rubric was developed to identify and evaluate discourse elements commonly found in argumentative writing. The rubric was developed in-house and went through multiple revisions based on feedback from two teacher panels as well as feedback from a research advisory board comprising experts in the fields of writing, discourse processing, linguistics, and machine learning. The discourse elements chosen for this rubric come from Nussbaum, Kardash, and Graham (2005) and Stapleton and Wu (2015). Both annotation schemes are adapted or simplified versions of the Toulmin argumentative framework (1958). Elements scored and brief descriptions for the elements are provided below.
Lead. An introduction that begins with a statistic, a quotation, a description, or some other device to grab the reader’s attention and point toward the thesis
Position. An opinion or conclusion on the main question
Claim. A claim that supports the position
Counterclaim. A claim that refutes another claim or gives an opposing reason to the position
Rebuttal. A claim that refutes a counterclaim
Evidence. Ideas or examples that support claims, counterclaims, rebuttals, or the position
Concluding Summary. A concluding statement that restates the position and claims
Unannotated. Segments that were not discourse elements
