
卖家暂未授权典枢平台对该文件进行数据验证,您可以向卖家
数据描述
About Dataset
The Wiki Neutrality Corpus consists of over 180,000 aligned sentences pre and post-neutralization by English Wikipedia editors from revisions made between 2004 and 2019 where editors provided NPOV related justification.
The dataset was introduced as part of the research paper: Automatically Neutralizing Subjective Bias in Text.
Loading data using pandas:pd.read_csv('biased.full', sep='\t', names=["id", "src_tok", "tgt_tok", "src_raw", "tgt_raw", "src_POS_tags", "tgt_parse_tags"])
All data files are TSVs with the following columns:
Columns | Description | Example |
---|---|---|
id |
A unique identifier which can be used to link to a Wikipedia Diff view. | 532355971 (Links to https://en.wikipedia.org/w/index.php?diff=532355971 |
src_tok |
Tokenized source text | she did not do as promised exposing her as an un ##pr ##in ##ci ##pled politician . |
tgt_tok |
Tokenized target text | she did not do , leading to accusations of her being an un ##pr ##in ##ci ##pled politician |
src_raw |
Raw source text | she did not do as promised exposing her as an unprincipled politician. |
tgt_raw |
Raw target text | she did not do , leading to accusations of her being an unprincipled politician. |
src_POS_tags |
Part-of-speech tags for source text | PRON VERB ADV VERB ADP VERB VERB PRON ADP DET ADJ ADJ ADJ ADJ ADJ NOUN PUNCT |
tgt_parse_tags |
Syntactic parse tags for target text using the Stanford Parser | nsubj aux neg ROOT mark advcl xcomp dobj prep det amod amod amod amod amod pobj punct |
BibTeX Citation:
@misc{pryzant2019automatically,
title={Automatically Neutralizing Subjective Bias in Text},
author={Reid Pryzant and Richard Diehl Martinez and Nathan Dass and Sadao Kurohashi and Dan Jurafsky and Diyi Yang},
year={2019},
eprint={1911.09709},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
