筱雨

verify-tagGeneral Language Understanding Evaluation (GLUE)

social sciencenlptext miningtexttext pre-processing

7

已售 0
83.62MB

数据标识:D17222459832209968

发布时间:2024/07/29

以下为卖家选择提供的数据验证报告:

数据描述

General Language Understanding Evaluation (GLUE)

The Famous General Language Understanding Evaluation benchmark


Source

> - Huggingface Hub: link

About this dataset

> GLUE, the General Language Understanding Evaluation benchmark is a collection of resources for training, evaluating, and analyzing natural language understanding systems.

Tasks

ax A manually-curated evaluation dataset for fine-grained analysis of system performance on a broad range of linguistic phenomena. This dataset evaluates sentence understanding through Natural Language Inference (NLI) problems. Use a model trained on MulitNLI to produce predictions for this dataset.

cola The Corpus of Linguistic Acceptability consists of English acceptability judgments drawn from books and journal articles on linguistic theory. Each example is a sequence of words annotated with whether it is a grammatical English sentence.

mnli The Multi-Genre Natural Language Inference Corpus is a crowdsourced collection of sentence pairs with textual entailment annotations. Given a premise sentence and a hypothesis sentence, the task is to predict whether the premise entails the hypothesis (entailment), contradicts the hypothesis (contradiction), or neither (neutral). The premise sentences are gathered from ten different sources, including transcribed speech, fiction, and government reports. The authors of the benchmark use the standard test set, for which they obtained private labels from the RTE authors, and evaluate on both the matched (in-domain) and mismatched (cross-domain) section. They also uses and recommend the SNLI corpus as 550k examples of auxiliary training data.

nli_matched The matched validation and test splits from MNLI. See the "mnli" BuilderConfig for additional information.

mnli_mismatched The mismatched validation and test splits from MNLI. See the "mnli" BuilderConfig for additional information.

mrpc The Microsoft Research Paraphrase Corpus (Dolan & Brockett, 2005) is a corpus of sentence pairs automatically extracted from online news sources, with human annotations for whether the sentences in the pair are semantically equivalent.

qnli The Stanford Question Answering Dataset is a question-answering dataset consisting of question-paragraph pairs, where one of the sentences in the paragraph (drawn from Wikipedia) contains the answer to the corresponding question (written by an annotator). The authors of the benchmark convert the task into sentence pair classification by forming a pair between each question and each sentence in the corresponding context, and filtering out pairs with low lexical overlap between the question and the context sentence. The task is to determine whether the context sentence contains the answer to the question. This modified version of the original task removes the requirement that the model select the exact answer, but also removes the simplifying assumptions that the answer is always present in the input and that lexical overlap is a reliable cue.

qqp The Quora Question Pairs2 dataset is a collection of question pairs from the community question-answering website Quora. The task is to determine whether a pair of questions are semantically equivalent.

rte The Recognizing Textual Entailment (RTE) datasets come from a series of annual textual entailment challenges. The authors of the benchmark combined the data from RTE1 (Dagan et al., 2006), RTE2 (Bar Haim et al., 2006), RTE3 (Giampiccolo et al., 2007), and RTE5 (Bentivogli et al., 2009). Examples are constructed based on news and Wikipedia text. The authors of the benchmark convert all datasets to a two-class split, where for three-class datasets they collapse neutral and contradiction into not entailment, for consistency.

sst2 The Stanford Sentiment Treebank consists of sentences from movie reviews and human annotations of their sentiment. The task is to predict the sentiment of a given sentence. It uses the two-way (positive/negative) class split, with only sentence-level labels.

stsb The Semantic Textual Similarity Benchmark (Cer et al., 2017) is a collection of sentence pairs drawn from news headlines, video and image captions, and natural language inference data. Each pair is human-annotated with a similarity score from 1 to 5.

wnli The Winograd Schema Challenge (Levesque et al., 2011) is a reading comprehension task in which a system must read a sentence with a pronoun and select the referent of that pronoun from a list of choices. The examples are manually constructed to foil simple statistical methods: Each one is contingent on contextual information provided by a single word or phrase in the sentence. To convert the problem into sentence pair classification, the authors of the benchmark construct sentence pairs by replacing the ambiguous pronoun with each possible referent. The task is to predict if the sentence with the pronoun substituted is entailed by the original sentence. They use a small evaluation set consisting of new examples derived from fiction books that was shared privately by the authors of the original corpus. While the included training set is balanced between two classes, the test set is imbalanced between them (65% not entailment). Also, due to a data quirk, the development set is adversarial: hypotheses are sometimes shared between training and development examples, so if a model memorizes the training examples, they will predict the wrong label on corresponding development set example. As with QNLI, each example is evaluated separately, so there is not a systematic correspondence between a model's score on this task and its score on the unconverted original task. The authors of the benchmark call converted dataset WNLI (Winograd NLI).

How to use the dataset

> The NLI Dataset is a large collection of sentence pairs automatically extracted from online news sources, with human annotations for whether the sentences in the pair are semantically equivalent. The dataset evaluates sentence understanding through Natural Language Inference (NLI) problems. To use this dataset, you will need to train a model on the MulitNLI dataset and use it to produce predictions for the NLI Dataset

Research Ideas

> - Train a model to classify semantically equivalent sentences. > - The dataset can be used to train a model to identify paraphrases. > - The dataset can be used to train a model to identify the entailment relation between two sentences > - And much more..

Acknowledgements

> > > ### License > > > > License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication > > No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: mrpc_train.csv

Column name Description
sentence1 The first sentence in the pair. (string)
sentence2 The second sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: rte_train.csv

Column name Description
sentence1 The first sentence in the pair. (string)
sentence2 The second sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: sst2_test.csv

Column name Description
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)
sentence1 The first sentence in the pair. (string)

File: cola_validation.csv

Column name Description
sentence1 The first sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: mnli_train.csv

Column name Description
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)
premise The premise sentence. (string)
hypothesis The hypothesis sentence. (string)

File: qqp_train.csv

Column name Description
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)
question1 The first sentence in the pair. (string)
question2 The second sentence in the pair. (string)

File: mnli_test_matched.csv

Column name Description
premise The premise sentence. (string)
hypothesis The hypothesis sentence. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: mrpc_validation.csv

Column name Description
sentence1 The first sentence in the pair. (string)
sentence2 The second sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: sst2_validation.csv

Column name Description
sentence1 The first sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: wnli_test.csv

Column name Description
sentence1 The first sentence in the pair. (string)
sentence2 The second sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: sst2_train.csv

Column name Description
sentence1 The first sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: qqp_test.csv

Column name Description
question1 The first sentence in the pair. (string)
question2 The second sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: stsb_validation.csv

Column name Description
sentence1 The first sentence in the pair. (string)
sentence2 The second sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: mnli_test_mismatched.csv

Column name Description
premise The premise sentence. (string)
hypothesis The hypothesis sentence. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: wnli_validation.csv

Column name Description
sentence1 The first sentence in the pair. (string)
sentence2 The second sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: rte_test.csv

Column name Description
sentence1 The first sentence in the pair. (string)
sentence2 The second sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: stsb_train.csv

Column name Description
sentence1 The first sentence in the pair. (string)
sentence2 The second sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: mnli_matched_validation.csv

Column name Description
premise The premise sentence. (string)
hypothesis The hypothesis sentence. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: qqp_validation.csv

Column name Description
question1 The first sentence in the pair. (string)
question2 The second sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: mnli_validation_mismatched.csv

Column name Description
premise The premise sentence. (string)
hypothesis The hypothesis sentence. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: rte_validation.csv

Column name Description
sentence1 The first sentence in the pair. (string)
sentence2 The second sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: stsb_test.csv

Column name Description
sentence1 The first sentence in the pair. (string)
sentence2 The second sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: wnli_train.csv

Column name Description
sentence1 The first sentence in the pair. (string)
sentence2 The second sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: qnli_test.csv

Column name Description
sentence1 The first sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)
question A short description of the column. (Column Type)

File: mnli_mismatched_test.csv

Column name Description
premise The premise sentence. (string)
hypothesis The hypothesis sentence. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: qnli_train.csv

Column name Description
question A short description of the column. (Column Type)
sentence1 The first sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: mnli_matched_test.csv

Column name Description
premise The premise sentence. (string)
hypothesis The hypothesis sentence. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: mrpc_test.csv

Column name Description
sentence1 The first sentence in the pair. (string)
sentence2 The second sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: ax_test.csv

Column name Description
premise The premise sentence. (string)
hypothesis The hypothesis sentence. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: mnli_validation_matched.csv

Column name Description
premise The premise sentence. (string)
hypothesis The hypothesis sentence. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: mnli_mismatched_validation.csv

Column name Description
premise The premise sentence. (string)
hypothesis The hypothesis sentence. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: cola_train.csv

Column name Description
sentence1 The first sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: cola_test.csv

Column name Description
sentence1 The first sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)

File: qnli_validation.csv

Column name Description
question A short description of the column. (Column Type)
sentence1 The first sentence in the pair. (string)
label The label for the pair, indicating whether the sentences are semantically equivalent (entailment), not semantically equivalent (contradiction), or neither (neutral). (string)
data icon
General Language Understanding Evaluation (GLUE)
7
已售 0
83.62MB
申请报告