殿下万岁

verify-tagpersuade corpus 2.0

textsocial issues and advocacy

3

已售 0
52.56MB

数据标识:D17222476823354612

发布时间:2024/07/29

以下为卖家选择提供的数据验证报告:

数据描述

The PERSUADE 2.0 corpus builds on the PERSUADE 1.0 corpus by providing holistic essay scores to each persuasive essay in the PERSUADE 1.0 corpus as well as proficiency scores for each argumentative and discourse element found in the initial corpus. This version also contains all essays (as compared to 1.0 which linked the training set for the Kaggle competition)

In total, the PERSUADE 2.0 corpus comprises over 25,000 argumentative essays produced by 6th-12th grade students in the United States for 15 prompts on two writing tasks: independent and source-based writing. The PERSUADE 2.0 corpus provides detailed individual and demographic information for each writer as well as the initial annotations for argumentative and discourse element found PERSUADE 1.0.

V2: Added sources in sources.csv. Links to full text and gpt4 summaries provided.

LICENSE UNKNOWN Data pulled from here

persuade_2.0_human_scores_demo_id_github.csv has full texts, holistic score, word count, prompt, task, source texts, gender, grade level, english language learner status, and race/ethnicity, economic status, disability status.

persuade_corpus_1.0.csv has the following columns:

essay_id_comp: The essay ID

competition_set: Whether the essay was part of the training or the test set in the Feedback Prize

full_text: The full text of the essay

discourse_id: ID for the discourse element

discourse_start: Character position in the essay where the discourse element starts

discourse_end discourse_text: Character position in the essay where the discourse element ends

discourse_type: Human annotation for the discourse element

discourse_type_num: Number for discourse element in essay

Each essay in the PERSUADE corpus was human annotated for argumentative and discourse elements as well as relationships between argumentative elements. The corpus was annotated using a double-blind rating process with 100 percent adjudication such that each essay was independently reviewed by two expert raters and adjudicated by a third expert rater.

The annotation rubric was developed to identify and evaluate discourse elements commonly found in argumentative writing. The rubric was developed in-house and went through multiple revisions based on feedback from two teacher panels as well as feedback from a research advisory board comprising experts in the fields of writing, discourse processing, linguistics, and machine learning. The discourse elements chosen for this rubric come from Nussbaum, Kardash, and Graham (2005) and Stapleton and Wu (2015). Both annotation schemes are adapted or simplified versions of the Toulmin argumentative framework (1958). Elements scored and brief descriptions for the elements are provided below.

Lead. An introduction that begins with a statistic, a quotation, a description, or some other device to grab the reader’s attention and point toward the thesis

Position. An opinion or conclusion on the main question

Claim. A claim that supports the position

Counterclaim. A claim that refutes another claim or gives an opposing reason to the position

Rebuttal. A claim that refutes a counterclaim

Evidence. Ideas or examples that support claims, counterclaims, rebuttals, or the position

Concluding Summary. A concluding statement that restates the position and claims

Unannotated. Segments that were not discourse elements

data icon
persuade corpus 2.0
3
已售 0
52.56MB
申请报告