V V

MultiNLI (Multi-Genre Natural Language Inference)

literatureclustering

￥4

已售 0

109.37MB

数据标识：D17222561889447063

发布时间：2024/07/29

MultiNLI (Multi-Genre Natural Language Inference)

Crowdsourced collection of 433k sentence pairs annotated with textual entailment

By Huggingface Hub [source]

About this dataset

> The Multi-Genre Natural Language Inference (MultiNLI) corpus provides a revolutionary resource for machine learning researchers exploring natural language understanding and processing. Offering a vast collection of 433,000 sentence pairs each annotated with textual entailment information, this dataset enables exploration into the interpretive powers of natural language across genres such as spoken and written. Moreover, with its cross-genre evaluation capabilities, MultiNLI has opened up exciting new possibilities that have never before been explored in the field of natural language inference. From examining distinct linguistic patterns to discovering new examples from different sources or genres, this dataset is unlocking the future of machine learning by providing an extraordinary gateway into this fast expanding world

More Datasets

> For more datasets, click here.

Featured Notebooks

> - 🚨 Your notebook can be here! 🚨!

How to use the dataset

> ### How to Use the MultiNLI Corpus > The MultiNLI Corpus is an invaluable resource for machine learning researchers who are exploring the power of natural language inference and understanding. This dataset contains 433,000 sentence pairs annotated with textual entailment information, genre, and label. Follow these steps to utilize this dataset for research purposes: > > 1. Identify the columns you require from the dataset. The columns available in this dataset are premise, premise_binary_parse, premise_parse, hypothesis, hypothesis_binary_parse, hypothesis_parse, genre and label. > 2. Select a subset or entirety of data that you require from either train.csv or validation matched/mismatched files in the MultiNLI Dataset depending on whether you intend to use it for training or testing respectively. > 3. Pre-process your sentences by tokenization (splitting long texts into tokens e.g words) and then run them through a parser which will produce linguistic representations like dependency trees or binary parse trees corresponding to every sentence pair that can be used as features later in your model building process instead of manual features extraction/engineering which is labour intensive . > 4further build your model using appropriate deep learining architecture adequate for NLP tasks like attentive RNNs that learn contextual representation fromraw text given their inherent ability ot aoolylocal context at each step when processing withinpue texts . Then train , evaluate ane tune hyperparameters accordingly until desired results are achieved.. > > By utilizing this powerful resource appropriately with cutting edge models , substantial progress towards reliabley inferring natural language can be made unlocking critical research possibilities while granting further insights into real world applications involving choice comprehension…

Research Ideas

> - Investigating the effects of out-of-domain and cross-genre evaluation on natural language processing tasks such as sentiment analysis, text classification, and summarization. > - Exploring unsupervised methods of identifying textual entailment relationships between sentences. > - Developing applications that can detect genre or context specific semantic inference systems to identify relationships across different types of language usage (spoken vs written)

Acknowledgements

> If you use this dataset in your research, please credit the original authors. > Data Source >

License

> > License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication > No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv

Column name	Description
premise	The premise of the sentence pair. (String)
premise_binary_parse	The binary parse of the premise sentence. (String)
premise_parse	The parse of the premise sentence. (String)
hypothesis	The hypothesis of the sentence pair. (String)
hypothesis_binary_parse	The binary parse of the hypothesis sentence. (String)
hypothesis_parse	The parse of the hypothesis sentence. (String)
genre	The genre of the sentence pair. (String)
label	The label indicating whether the premise culminates in the hypothesis or not. (String)

File: validation_matched.csv

Column name	Description
premise	The premise of the sentence pair. (String)
premise_binary_parse	The binary parse of the premise sentence. (String)
premise_parse	The parse of the premise sentence. (String)
hypothesis	The hypothesis of the sentence pair. (String)
hypothesis_binary_parse	The binary parse of the hypothesis sentence. (String)
hypothesis_parse	The parse of the hypothesis sentence. (String)
genre	The genre of the sentence pair. (String)
label	The label indicating whether the premise culminates in the hypothesis or not. (String)

File: validation_mismatched.csv

Column name	Description
premise	The premise of the sentence pair. (String)
premise_binary_parse	The binary parse of the premise sentence. (String)
premise_parse	The parse of the premise sentence. (String)
hypothesis	The hypothesis of the sentence pair. (String)
hypothesis_binary_parse	The binary parse of the hypothesis sentence. (String)
hypothesis_parse	The parse of the hypothesis sentence. (String)
genre	The genre of the sentence pair. (String)
label	The label indicating whether the premise culminates in the hypothesis or not. (String)

Acknowledgements

> If you use this dataset in your research, please credit the original authors. > If you use this dataset in your research, please credit Huggingface Hub.

看了又看

验证报告

以下为卖家选择提供的数据验证报告：

MultiNLI (Multi-Genre Natural Language Inference)

￥4

已售 0

109.37MB

申请报告

MultiNLI (Multi-Genre Natural Language Inference)

MultiNLI (Multi-Genre Natural Language Inference)

Crowdsourced collection of 433k sentence pairs annotated with textual entailment

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

关于典枢

下载与支持

服务协议

关于我们

官方公众号

技术交流群