以下为卖家选择提供的数据验证报告:
数据描述
HellaSwag (Commonsense NLI)
Can a Machine Really Finish Your Sentence?
Source
> Paper: link > Huggingface Hub: link
About this dataset
> HellaSwag is a dataset that tests a machine's ability to complete sentences in a way that makes sense. The dataset contains over 10,000 examples of sentence completion, with four possible endings for each sentence. The task for the machine is to choose the ending that best completes the sentence. > > This task is difficult for a machine because it requires understanding not just the words in the sentence, but also the underlying meaning and context. For humans, this task is easy because we have years of experience understanding language and common sense. But for machines, it's a whole new challenge. > > HellaSwag is an important step towards building artificial intelligence systems that can communicate like humans. By testing how well machines can understand and generate language, we can better assess where they currently stand and what areas need improvement
How to use the dataset
> In order to use the HellaSwag dataset, you will need to first download the data from Kaggle. Once you have downloaded the data, you will need to unzip the file and then open the train.csv file. > > Once you have opened the train.csv file, you will see four columns: ctx_a, ctx_b, ending_a, and ending_b. The ctx_a and ctx_b columns contain the context sentences for each example, while ending_a and ending_b contain the two possible endings for each example. The label column indicates which of the two endings is correct for each example. > > In order to use this dataset, you can simply split it into a training set and test set using any standard splitting method (e.g., 80/20). Once you have your training and test sets split up, you can then train any standard classification algorithm on the training set in order to predict which of the two endings is correct for each example in the test set
Research Ideas
> Using this dataset, you can:* > - Use it to train a model that can generate new endings for sentences, similar to the way a human would. > - Use it to build a model that can better understand the context of a sentence, by choosing the right ending based on the context. > - Train a models that can take two sentences with different endings and choose which one is more likely to be true, based on commonsense knowledge
Acknowledgements
> > Paper: link > Huggingface Hub: link > > ### License > > > > License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication > > No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: validation.csv
Column name | Description |
---|---|
ind | The index of the sentence. (Integer) |
activity_label | The label of the activity. (String) |
ctx_a | The context sentence A. (String) |
ctx_b | The context sentence B. (String) |
endings | The endings of the sentence. (String) |
split | The split of the dataset. (String) |
split_type | The type of split. (String) |
File: train.csv
Column name | Description |
---|---|
ind | The index of the sentence. (Integer) |
activity_label | The label of the activity. (String) |
ctx_a | The context sentence A. (String) |
ctx_b | The context sentence B. (String) |
endings | The endings of the sentence. (String) |
split | The split of the dataset. (String) |
split_type | The type of split. (String) |
File: test.csv
Column name | Description |
---|---|
ind | The index of the sentence. (Integer) |
activity_label | The label of the activity. (String) |
ctx_a | The context sentence A. (String) |
ctx_b | The context sentence B. (String) |
endings | The endings of the sentence. (String) |
split | The split of the dataset. (String) |
split_type | The type of split. (String) |
