345

verify-tagHellaSwag (Commonsense NLI)

nlptext miningtexttext classificationtext pre-processing

5

已售 0
17.45MB

数据标识:D17222363749530002

发布时间:2024/07/29

以下为卖家选择提供的数据验证报告:

数据描述

HellaSwag (Commonsense NLI)

Can a Machine Really Finish Your Sentence?


Source

> Paper: link > Huggingface Hub: link

About this dataset

> HellaSwag is a dataset that tests a machine's ability to complete sentences in a way that makes sense. The dataset contains over 10,000 examples of sentence completion, with four possible endings for each sentence. The task for the machine is to choose the ending that best completes the sentence. > > This task is difficult for a machine because it requires understanding not just the words in the sentence, but also the underlying meaning and context. For humans, this task is easy because we have years of experience understanding language and common sense. But for machines, it's a whole new challenge. > > HellaSwag is an important step towards building artificial intelligence systems that can communicate like humans. By testing how well machines can understand and generate language, we can better assess where they currently stand and what areas need improvement

How to use the dataset

> In order to use the HellaSwag dataset, you will need to first download the data from Kaggle. Once you have downloaded the data, you will need to unzip the file and then open the train.csv file. > > Once you have opened the train.csv file, you will see four columns: ctx_a, ctx_b, ending_a, and ending_b. The ctx_a and ctx_b columns contain the context sentences for each example, while ending_a and ending_b contain the two possible endings for each example. The label column indicates which of the two endings is correct for each example. > > In order to use this dataset, you can simply split it into a training set and test set using any standard splitting method (e.g., 80/20). Once you have your training and test sets split up, you can then train any standard classification algorithm on the training set in order to predict which of the two endings is correct for each example in the test set

Research Ideas

> Using this dataset, you can:* > - Use it to train a model that can generate new endings for sentences, similar to the way a human would. > - Use it to build a model that can better understand the context of a sentence, by choosing the right ending based on the context. > - Train a models that can take two sentences with different endings and choose which one is more likely to be true, based on commonsense knowledge

Acknowledgements

> > Paper: link > Huggingface Hub: link > > ### License > > > > License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication > > No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: validation.csv

Column name Description
ind The index of the sentence. (Integer)
activity_label The label of the activity. (String)
ctx_a The context sentence A. (String)
ctx_b The context sentence B. (String)
endings The endings of the sentence. (String)
split The split of the dataset. (String)
split_type The type of split. (String)

File: train.csv

Column name Description
ind The index of the sentence. (Integer)
activity_label The label of the activity. (String)
ctx_a The context sentence A. (String)
ctx_b The context sentence B. (String)
endings The endings of the sentence. (String)
split The split of the dataset. (String)
split_type The type of split. (String)

File: test.csv

Column name Description
ind The index of the sentence. (Integer)
activity_label The label of the activity. (String)
ctx_a The context sentence A. (String)
ctx_b The context sentence B. (String)
endings The endings of the sentence. (String)
split The split of the dataset. (String)
split_type The type of split. (String)
data icon
HellaSwag (Commonsense NLI)
5
已售 0
17.45MB
申请报告