莉莉

ProsocialDialog - Problematic Content Dialogue

mental healthpeople and societyeducationpsychologynlpsocial networks

￥3

已售 0

24.21MB

数据标识：D17222485648330716

发布时间：2024/07/29

ProsocialDialog - Problematic Content Dialogue Dataset

Teach conversation agents to respond to problematic topics

By Huggingface Hub [source]

About this dataset

> ProsocialDialog is the first large-scale multi-turn English dialogue dataset to teach conversational agents to respond to problematic content following social norms. Covering diverse unethical, problematic, biased, and toxic situations, ProsocialDialog contains responses that encourage prosocial behavior, grounded in commonsense social rules (i.e., rules-of-thumb, RoTs). Created via a human-AI collaborative framework, ProsocialDialog consists of 58K dialogues, with 331K utterances, 160K unique RoTs, and 497K dialogue safety labels accompanied by free-form rationales.

More Datasets

> For more datasets, click here.

Featured Notebooks

> - 🚨 Your notebook can be here! 🚨!

How to use the dataset

> > This guide will explain how to use the data in this dataset for teaching conversational agents normative responses to problematic content. > > - Understand the columns: Familiarizing yourself with the columns provided in this dataset is important so you know what types of information are available for your analysis. The following columns are included in this dataset: 'context','response','rots','safety_label', 'safety_annotations','safety_annotation_reasons', 'source', and 'etc'. Each column contains different information about dialogue conversations including the context, response, rules of thumb (RoTs), safety label, annotations, rationale and sources used for conversations. > > - Explore Safety Labels : Exploring through each safety label will allow you understand what type of conversation is deemed appropriate or inappropriate by its corresponding label in the ‘safety_label’ column . In addition to exploring these labels it can also be helpful to explore through its respective ‘safety annotations’ as well as their associated ‘free-form rationales’ which allows sees where certain decisions were made within these conversations when giving ratings towards them . > > - Learn from Rules of Thumb (ROTs): Examining both individually listed ROTs and actuall dialogues that have been culturally deemed as acceptable or unacceptable can help you better understand what actions ought to be taken when providing a normative response towards any type of problematic content one may encounter within their own conversation settings . > > - Analyze Sources : Analyzing sources plays an important role since they give insight into where they obtained any given data from , whether they are first party interviews or third party websites, analyzing sources gives us insight into why that particular piece was labeled a certain way while others may have been given higher/lower ratings depending on such factors like trustworthiness among other things which should be kept into consideration when using this source for training models . > > 5 Taking Action: After familiarizing yourself with all these various components , try mapping out scenarios between two people engaging in conversation and write directions based on each ROT applicable provide scenarios demonstrating socially acceptable behavior when confronted with nonnormative behavior throughout conversations using networks like self reinforcing looping can produce

Research Ideas

> - Designing Conversational Agents: This dataset can be used to build natural language processing (NLP) models that can recognize and classify problematic content. The safety labels, rationales, and RoTs can be leveraged to teach conversational agents how to respond to such content in a socially acceptable manner. > - Benchmark Systems: ProsocialDialog could be used as a benchmark system for assessing the performance of existing conversation datasets in terms of recognizing, responding to, and helping prevent problematic content interactions. > - Automated Moderation: The dialogue safety labels and associated free-form rationales found in the dataset can be leveraged by technology platforms for automated moderation tasks such as flagging or banning offensive messages or involved users when needed

Acknowledgements

> If you use this dataset in your research, please credit the original authors. > Data Source > >

License

> > > License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication > No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: validation.csv

Column name	Description
context	The context of the conversation. (String)
response	The response to the conversation. (String)
rots	Rules of thumb associated with the conversation. (String)
safety_label	The safety label associated with the conversation. (String)
safety_annotations	Annotations associated with the conversation. (String)
safety_annotation_reasons	Reasons for the safety annotations. (String)
source	The source of the conversation. (String)
etc	Any additional information associated with the conversation. (String)
episode_done	Whether the conversation is complete or not. (Boolean)

File: train.csv

Column name	Description
context	The context of the conversation. (String)
response	The response to the conversation. (String)
rots	Rules of thumb associated with the conversation. (String)
safety_label	The safety label associated with the conversation. (String)
safety_annotations	Annotations associated with the conversation. (String)
safety_annotation_reasons	Reasons for the safety annotations. (String)
source	The source of the conversation. (String)
etc	Any additional information associated with the conversation. (String)
episode_done	Whether the conversation is complete or not. (Boolean)

File: test.csv

Column name	Description
context	The context of the conversation. (String)
response	The response to the conversation. (String)
rots	Rules of thumb associated with the conversation. (String)
safety_label	The safety label associated with the conversation. (String)
safety_annotations	Annotations associated with the conversation. (String)
safety_annotation_reasons	Reasons for the safety annotations. (String)
source	The source of the conversation. (String)
etc	Any additional information associated with the conversation. (String)
episode_done	Whether the conversation is complete or not. (Boolean)

Acknowledgements

> If you use this dataset in your research, please credit the original authors. > If you use this dataset in your research, please credit Huggingface Hub.

看了又看

验证报告

以下为卖家选择提供的数据验证报告：

ProsocialDialog - Problematic Content Dialogue

￥3

已售 0

24.21MB

申请报告

ProsocialDialog - Problematic Content Dialogue

ProsocialDialog - Problematic Content Dialogue Dataset

Teach conversation agents to respond to problematic topics

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

关于典枢

下载与支持

服务协议

关于我们

官方公众号

技术交流群