以下为卖家选择提供的数据验证报告:
数据描述
MultiNLI (Multi-Genre Natural Language Inference)
Crowdsourced collection of 433k sentence pairs annotated with textual entailment
By Huggingface Hub [source]
About this dataset
> The Multi-Genre Natural Language Inference (MultiNLI) corpus provides a revolutionary resource for machine learning researchers exploring natural language understanding and processing. Offering a vast collection of 433,000 sentence pairs each annotated with textual entailment information, this dataset enables exploration into the interpretive powers of natural language across genres such as spoken and written. Moreover, with its cross-genre evaluation capabilities, MultiNLI has opened up exciting new possibilities that have never before been explored in the field of natural language inference. From examining distinct linguistic patterns to discovering new examples from different sources or genres, this dataset is unlocking the future of machine learning by providing an extraordinary gateway into this fast expanding world
More Datasets
> For more datasets, click here.
Featured Notebooks
> - 🚨 Your notebook can be here! 🚨!
How to use the dataset
> ### How to Use the MultiNLI Corpus > The MultiNLI Corpus is an invaluable resource for machine learning researchers who are exploring the power of natural language inference and understanding. This dataset contains 433,000 sentence pairs annotated with textual entailment information, genre, and label. Follow these steps to utilize this dataset for research purposes: > > 1. Identify the columns you require from the dataset. The columns available in this dataset are premise, premise_binary_parse, premise_parse, hypothesis, hypothesis_binary_parse, hypothesis_parse, genre and label. > 2. Select a subset or entirety of data that you require from either train.csv or validation matched/mismatched files in the MultiNLI Dataset depending on whether you intend to use it for training or testing respectively. > 3. Pre-process your sentences by tokenization (splitting long texts into tokens e.g words) and then run them through a parser which will produce linguistic representations like dependency trees or binary parse trees corresponding to every sentence pair that can be used as features later in your model building process instead of manual features extraction/engineering which is labour intensive . > 4further build your model using appropriate deep learining architecture adequate for NLP tasks like attentive RNNs that learn contextual representation fromraw text given their inherent ability ot aoolylocal context at each step when processing withinpue texts . Then train , evaluate ane tune hyperparameters accordingly until desired results are achieved.. > > By utilizing this powerful resource appropriately with cutting edge models , substantial progress towards reliabley inferring natural language can be made unlocking critical research possibilities while granting further insights into real world applications involving choice comprehension…
Research Ideas
> - Investigating the effects of out-of-domain and cross-genre evaluation on natural language processing tasks such as sentiment analysis, text classification, and summarization. > - Exploring unsupervised methods of identifying textual entailment relationships between sentences. > - Developing applications that can detect genre or context specific semantic inference systems to identify relationships across different types of language usage (spoken vs written)
Acknowledgements
> If you use this dataset in your research, please credit the original authors. > Data Source >
License
> > License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication > No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: train.csv
Column name | Description |
---|---|
premise | The premise of the sentence pair. (String) |
premise_binary_parse | The binary parse of the premise sentence. (String) |
premise_parse | The parse of the premise sentence. (String) |
hypothesis | The hypothesis of the sentence pair. (String) |
hypothesis_binary_parse | The binary parse of the hypothesis sentence. (String) |
hypothesis_parse | The parse of the hypothesis sentence. (String) |
genre | The genre of the sentence pair. (String) |
label | The label indicating whether the premise culminates in the hypothesis or not. (String) |
File: validation_matched.csv
Column name | Description |
---|---|
premise | The premise of the sentence pair. (String) |
premise_binary_parse | The binary parse of the premise sentence. (String) |
premise_parse | The parse of the premise sentence. (String) |
hypothesis | The hypothesis of the sentence pair. (String) |
hypothesis_binary_parse | The binary parse of the hypothesis sentence. (String) |
hypothesis_parse | The parse of the hypothesis sentence. (String) |
genre | The genre of the sentence pair. (String) |
label | The label indicating whether the premise culminates in the hypothesis or not. (String) |
File: validation_mismatched.csv
Column name | Description |
---|---|
premise | The premise of the sentence pair. (String) |
premise_binary_parse | The binary parse of the premise sentence. (String) |
premise_parse | The parse of the premise sentence. (String) |
hypothesis | The hypothesis of the sentence pair. (String) |
hypothesis_binary_parse | The binary parse of the hypothesis sentence. (String) |
hypothesis_parse | The parse of the hypothesis sentence. (String) |
genre | The genre of the sentence pair. (String) |
label | The label indicating whether the premise culminates in the hypothesis or not. (String) |
Acknowledgements
> If you use this dataset in your research, please credit the original authors. > If you use this dataset in your research, please credit Huggingface Hub.
