老下头

verify-tagPairwise sentence complexity comparison

nlptext

12

已售 0
1.07GB

数据标识:D17173882290584944

发布时间:2024/06/03

以下为卖家选择提供的数据验证报告:

数据描述

Dataset creation

The dataset was created by this notebook: https://www.kaggle.com/douglaskgaraujo/sentence-complexity-comparison-dataset

Context

This data is a pairwise comparison of sentences, together with information about their relative complexity. The original dataset is from the CommonLit Readability Prize competition, and interested readers are referred there (especially the competitions' discussion forums) for more information on the data itself.

Important notice! As per that competition's rules, the license is as follows:

  1. COMPETITION DATA. "Competition Data" means the data or datasets available from the Competition Website for the purpose of use in the Competition, including any prototype or executable code provided on the Competition Website. The Competition Data will contain private and public test sets. Which data belongs to which set will not be made available to participants.

A. Data Access and Use. Competition Use and Non-Commercial & Academic Research: *You may access and use the Competition Data for non-commercial purposes only, including for participating in the Competition and on Kaggle.com forums, and for academic research and education. *The Competition Sponsor reserves the right to disqualify any participant who uses the Competition Data other than as permitted by the Competition Website and these Rules.

B. Data Security. You agree to use reasonable and suitable measures to prevent persons who have not formally agreed to these Rules from gaining access to the Competition Data. You agree not to transmit, duplicate, publish, redistribute or otherwise provide or make available the Competition Data to any party not participating in the Competition. You agree to notify Kaggle immediately upon learning of any possible unauthorized transmission of or unauthorized access to the Competition Data and agree to work with Kaggle to rectify any unauthorized transmission or access.

C. External Data. You may use data other than the Competition Data (“External Data”) to develop and test your Submissions. However, you will ensure the External Data is publicly available and equally accessible to use by all participants of the Competition for purposes of the competition at no cost to the other participants. The ability to use External Data under this Section 7.C (External Data) does not limit your other obligations under these Competition Rules, including but not limited to Section 11 (Winners Obligations).

Content

This dataset is a pairwise comparison of each sentence in the CommonLit competition with 500 other randomly-matched sentences. Sentences are divided into a training and validation datasets before being matched randomly. The relative complexity of each sentence is measured, and features such as the distance between this score for both sentences, and a column indicating whether or not the first sentence's readability score is greater than or equal to the score of the second sentence.

Acknowledgements

Thank you for the organisers of this competition for providing this dataset.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?

data icon
Pairwise sentence complexity comparison
12
已售 0
1.07GB
申请报告