以下为卖家选择提供的数据验证报告:
数据描述
ProsocialDialog - Problematic Content Dialogue Dataset
Teach conversation agents to respond to problematic topics
By Huggingface Hub [source]
About this dataset
> ProsocialDialog is the first large-scale multi-turn English dialogue dataset to teach conversational agents to respond to problematic content following social norms. Covering diverse unethical, problematic, biased, and toxic situations, ProsocialDialog contains responses that encourage prosocial behavior, grounded in commonsense social rules (i.e., rules-of-thumb, RoTs). Created via a human-AI collaborative framework, ProsocialDialog consists of 58K dialogues, with 331K utterances, 160K unique RoTs, and 497K dialogue safety labels accompanied by free-form rationales.
More Datasets
> For more datasets, click here.
Featured Notebooks
> - 🚨 Your notebook can be here! 🚨!
How to use the dataset
> > This guide will explain how to use the data in this dataset for teaching conversational agents normative responses to problematic content. > > - Understand the columns: Familiarizing yourself with the columns provided in this dataset is important so you know what types of information are available for your analysis. The following columns are included in this dataset: 'context','response','rots','safety_label', 'safety_annotations','safety_annotation_reasons', 'source', and 'etc'. Each column contains different information about dialogue conversations including the context, response, rules of thumb (RoTs), safety label, annotations, rationale and sources used for conversations. > > - Explore Safety Labels : Exploring through each safety label will allow you understand what type of conversation is deemed appropriate or inappropriate by its corresponding label in the ‘safety_label’ column . In addition to exploring these labels it can also be helpful to explore through its respective ‘safety annotations’ as well as their associated ‘free-form rationales’ which allows sees where certain decisions were made within these conversations when giving ratings towards them . > > - Learn from Rules of Thumb (ROTs): Examining both individually listed ROTs and actuall dialogues that have been culturally deemed as acceptable or unacceptable can help you better understand what actions ought to be taken when providing a normative response towards any type of problematic content one may encounter within their own conversation settings . > > - Analyze Sources : Analyzing sources plays an important role since they give insight into where they obtained any given data from , whether they are first party interviews or third party websites, analyzing sources gives us insight into why that particular piece was labeled a certain way while others may have been given higher/lower ratings depending on such factors like trustworthiness among other things which should be kept into consideration when using this source for training models . > > 5 Taking Action: After familiarizing yourself with all these various components , try mapping out scenarios between two people engaging in conversation and write directions based on each ROT applicable provide scenarios demonstrating socially acceptable behavior when confronted with nonnormative behavior throughout conversations using networks like self reinforcing looping can produce
Research Ideas
> - Designing Conversational Agents: This dataset can be used to build natural language processing (NLP) models that can recognize and classify problematic content. The safety labels, rationales, and RoTs can be leveraged to teach conversational agents how to respond to such content in a socially acceptable manner. > - Benchmark Systems: ProsocialDialog could be used as a benchmark system for assessing the performance of existing conversation datasets in terms of recognizing, responding to, and helping prevent problematic content interactions. > - Automated Moderation: The dialogue safety labels and associated free-form rationales found in the dataset can be leveraged by technology platforms for automated moderation tasks such as flagging or banning offensive messages or involved users when needed
Acknowledgements
> If you use this dataset in your research, please credit the original authors. > Data Source > >
License
> > > License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication > No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: validation.csv
Column name | Description |
---|---|
context | The context of the conversation. (String) |
response | The response to the conversation. (String) |
rots | Rules of thumb associated with the conversation. (String) |
safety_label | The safety label associated with the conversation. (String) |
safety_annotations | Annotations associated with the conversation. (String) |
safety_annotation_reasons | Reasons for the safety annotations. (String) |
source | The source of the conversation. (String) |
etc | Any additional information associated with the conversation. (String) |
episode_done | Whether the conversation is complete or not. (Boolean) |
File: train.csv
Column name | Description |
---|---|
context | The context of the conversation. (String) |
response | The response to the conversation. (String) |
rots | Rules of thumb associated with the conversation. (String) |
safety_label | The safety label associated with the conversation. (String) |
safety_annotations | Annotations associated with the conversation. (String) |
safety_annotation_reasons | Reasons for the safety annotations. (String) |
source | The source of the conversation. (String) |
etc | Any additional information associated with the conversation. (String) |
episode_done | Whether the conversation is complete or not. (Boolean) |
File: test.csv
Column name | Description |
---|---|
context | The context of the conversation. (String) |
response | The response to the conversation. (String) |
rots | Rules of thumb associated with the conversation. (String) |
safety_label | The safety label associated with the conversation. (String) |
safety_annotations | Annotations associated with the conversation. (String) |
safety_annotation_reasons | Reasons for the safety annotations. (String) |
source | The source of the conversation. (String) |
etc | Any additional information associated with the conversation. (String) |
episode_done | Whether the conversation is complete or not. (Boolean) |
Acknowledgements
> If you use this dataset in your research, please credit the original authors. > If you use this dataset in your research, please credit Huggingface Hub.
