Down Shift

verify-tagNLP SubjQA: Question Answering Dataset

educationpsychologynlp

29

已售 0
11MB

数据标识:D17171515138139655

发布时间:2024/05/31

以下为卖家选择提供的数据验证报告:

数据描述

Context

Subjectivity is the expression of internal opinions or beliefs which cannot be objectively observed or verified, and has been shown to be important for sentiment analysis and wordsense disambiguation. Furthermore, subjectivity is an important aspect of user-generated data. In spite of this, subjectivity has not been investigated in contexts where such data is widespread, such as in question answering (QA). This new dataset allows us to investigate this relationship. Subjectivity is an important feature in the case of QA, albeit with more intricate interactions between subjectivity and QA performance than found in previous works on sentiment analysis. For instance, a subjective question may or may not be associated with a subjective answer.

qagif gif . NLP — Building a Question Answering model

Content

SubjQA is a question answering dataset that focuses on subjective (as opposed to factual) questions and answers. The dataset consists of roughly 10,000 questions over reviews from 6 different domains: books, movies, grocery, electronics, TripAdvisor (i.e. hotels), and restaurants. Each question is paired with a review and a span is highlighted as the answer to the question (with some questions having no answer). Moreover, both questions and answer spans are assigned a subjectivity label by annotators. Questions such as "How much does this product weigh?" is a factual question (i.e., low subjectivity), while "Is this easy to use?" is a subjective question (i.e., high subjectivity).

In short, SubjQA provides a setting to study how well extractive QA systems perform on finding answer that are less factual and to what extent modeling subjectivity can imporve thte performance of QA systems.

All files are in standard csv format, and they consist of the following columns:

-domain: The category/domain of the review (e.g., hotels, books, ...). -question: The question (written based on a query opinion). -review: The review (that mentions the neighboring opinion). -human_ans_spans: The span labeled by annotators as the answer. -human_ans_indices: The (character-level) start and end indices of the answer span highlighted by annotators. -question_subj_level: The subjectiviy level of the question (on a 1 to 5 scale with 1 being the most subjective). -ques_subj_score: The subjectivity score of the question computed using the TextBlob package. -is_ques_subjective: A boolean subjectivity label derived from question_subj_level (i.e., scores below 4 are considered as subjective) -answer_subj_level: The subjectiviy level of the answer span (on a 1 to 5 scale with 5 being the most subjective). -ans_subj_score: The subjectivity score of the answer span computed usign the TextBlob package. -is_ans_subjective: A boolean subjectivity label derived from answer_subj_level (i.e., scores below 4 are considered as subjective) -nn_mod: The modifier of the neighboring opinion (which appears in the review). -nn_asp: The aspect of the neighboring opinion (which appears in the review). -query_mod: The modifier of the query opinion (around which a question is manually written). -query_asp: The aspect of the query opinion (around which a question is manually written). -item_id: The id of the item/business discussed in the review. -review_id: A unique id associated with the review. -q_review_id: A unique id assigned to the question-review pair. -q_reviews_id: A unique id assigned to all question-review pairs with a shared question.

Acknowledgements

SubjQA: A Dataset for Subjectivity and Review Comprehension Johannes Bjerva, Nikita Bhutani, Behzad Golshan, Wang-Chiew Tan, Isabelle Augenstein

Inspiration

  1. extract all opinions expressed in reviews are . ( Use a pipeline which each opinion is modeled as a (modifier, aspect) pair which is a pair of spans where the former describes the latter. e.g. (good, hotel), and (terrible, acting) )

  2. Use Matrix Factorization techninques, implication relationships between different expressed opinions . For instance, the system mines that "responsive keys" implies "good keyboard".

  3. The question and review pairs are presented to annotators to select the correct answer span. Rate the subjectivity level of the question as well as the subjectivity level of the highlighted answer span.

data icon
NLP SubjQA: Question Answering Dataset
29
已售 0
11MB
申请报告