Wiki Neutrality Corpus (WNC)

Eric🐟

Wiki Neutrality Corpus (WNC)

TabularNLP Text

￥5

已售 0

106.7MB

数据标识：D17169025274954562

发布时间：2024/05/28

数据描述

About Dataset

The Wiki Neutrality Corpus consists of over 180,000 aligned sentences pre and post-neutralization by English Wikipedia editors from revisions made between 2004 and 2019 where editors provided NPOV related justification.

The dataset was introduced as part of the research paper: Automatically Neutralizing Subjective Bias in Text.

Loading data using pandas:
pd.read_csv('biased.full', sep='\t', names=["id", "src_tok", "tgt_tok", "src_raw", "tgt_raw", "src_POS_tags", "tgt_parse_tags"])

All data files are TSVs with the following columns:

Columns	Description	Example
`id`	A unique identifier which can be used to link to a Wikipedia Diff view.	532355971 (Links to https://en.wikipedia.org/w/index.php?diff=532355971
`src_tok`	Tokenized source text	she did not do as promised exposing her as an un ##pr ##in ##ci ##pled politician .
`tgt_tok`	Tokenized target text	she did not do , leading to accusations of her being an un ##pr ##in ##ci ##pled politician
`src_raw`	Raw source text	she did not do as promised exposing her as an unprincipled politician.
`tgt_raw`	Raw target text	she did not do , leading to accusations of her being an unprincipled politician.
`src_POS_tags`	Part-of-speech tags for source text	PRON VERB ADV VERB ADP VERB VERB PRON ADP DET ADJ ADJ ADJ ADJ ADJ NOUN PUNCT
`tgt_parse_tags`	Syntactic parse tags for target text using the Stanford Parser	nsubj aux neg ROOT mark advcl xcomp dobj prep det amod amod amod amod amod pobj punct

BibTeX Citation:
@misc{pryzant2019automatically,
title={Automatically Neutralizing Subjective Bias in Text},
author={Reid Pryzant and Richard Diehl Martinez and Nathan Dass and Sadao Kurohashi and Dan Jurafsky and Diyi Yang},
year={2019},
eprint={1911.09709},
archivePrefix={arXiv},
primaryClass={cs.CL}
}

验证报告

卖家暂未授权典枢平台对该文件进行数据验证，您可以向卖家

申请验证报告

。

Wiki Neutrality Corpus (WNC)

￥5

已售 0

106.7MB

申请报告

Wiki Neutrality Corpus (WNC)

About Dataset

关于典枢

下载与支持

服务协议

关于我们

官方公众号

技术交流群