XNLI - Multilingual NLI

A dataset for multilingual natural language inference tasks

By xnli (From Huggingface) [source]

About this dataset

> The xnli Multilingual Natural Language Inference Dataset is a comprehensive collection of data specifically curated for training and evaluating natural language inference (NLI) models in various languages. It provides a diverse range of language splits, each containing examples in different languages such as Arabic, Bulgarian, Chinese, German, English, Greek, Spanish, French, Hindi, Indonesian, Italian, Japanese and many others. > > With the goal of facilitating NLI tasks across multiple languages, this dataset includes separate CSV files for each language split. The available splits cover an extensive range of languages including widely spoken ones like English and Spanish as well as less commonly used ones like Urdu and Vietnamese. > > Each CSV file consists of labeled examples that are essential for training and assessing the performance of NLI models. These examples contain two main components: the premise and the hypothesis. The premise represents the initial sentence or text segment that forms the foundation for the NLI task. On the other hand,the hypothesis serves as the second sentence or text segment. Its comparison to the premise determines the logical relationship between them. > > One crucial aspect contributing to effective analysis is the label assigned to each example indicating its logical relationship with respect to entailment or contradiction against their respective premises. These labels fall into three categories: entailment (where it can be inferred from** the premise), contradiction (when it contradicts the premise), or neutral (when there exists no logical relationship between them). > > Moreover, to support development across different linguistic domains, this dataset also includes specific test splits dedicated to evaluating NLI models in individual languages such as English (en_test.csv), Urdu (ur_test.csv), among others. > > Researchers and practitioners engaged in building multilingual NLI models can utilize this xnli dataset encompassing numerous language variations along with suitable labeled examples to train their models effectively and assess their performance accurately in terms of understanding logical relationships between sentences within multiple linguistic contexts

Research Ideas

> - Cross-lingual NLI Modeling: The xnli dataset provides an opportunity to train and test natural language inference models across multiple languages. Researchers can use this dataset to develop cross-lingual NLI models that can effectively understand the logical relationship between premises and hypotheses in different languages. > - Language Transfer Learning: By training on the xnli dataset, language models can learn to transfer their knowledge across different languages. This dataset can be used for pre-training models in one language and fine-tuning them for downstream tasks in another language, improving the performance of natural language understanding models in low-resource languages. > - Multilingual Evaluation Benchmarks: The xnli dataset serves as a benchmark for evaluating NLI models' performance across various languages. It allows researchers to compare the effectiveness of different models and techniques in handling diverse linguistic expressions, enabling advancements in multilingual understanding capabilities

Acknowledgements

> If you use this dataset in your research, please credit the original authors. > Data Source > >

License

> > > License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication > No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: el_validation.csv

Column name	Description
premise	The first sentence or text segment that serves as the basis for the natural language inference task. (Text)
hypothesis	The second sentence or text segment that is compared to the premise to determine the logical relationship between them. (Text)
label	The label indicating the logical relationship between the premise and hypothesis. It can be one of three categories - entailment, contradiction, or neutral. (Categorical)

File: en_test.csv

Column name	Description
premise	The first sentence or text segment that serves as the basis for the natural language inference task. (Text)
hypothesis	The second sentence or text segment that is compared to the premise to determine the logical relationship between them. (Text)
label	The label indicating the logical relationship between the premise and hypothesis. It can be one of three categories - entailment, contradiction, or neutral. (Categorical)

File: ur_test.csv

Column name	Description
premise	The first sentence or text segment that serves as the basis for the natural language inference task. (Text)
hypothesis	The second sentence or text segment that is compared to the premise to determine the logical relationship between them. (Text)
label	The label indicating the logical relationship between the premise and hypothesis. It can be one of three categories - entailment, contradiction, or neutral. (Categorical)

Acknowledgements

> If you use this dataset in your research, please credit the original authors. > If you use this dataset in your research, please credit xnli (From Huggingface).

看了又看

验证报告

以下为卖家选择提供的数据验证报告：

XNLI - Multilingual NLI

￥15

已售 0

1.05GB

申请报告