数据描述
Medical Conversation Corpus (100k+)
Generative Language Modeling for Medical Applications
By Huggingface Hub [source]
About this dataset
> This comprehensive and open-source dataset of 100k+ conversations and instructions that include medical terminologies is perfect for training Generative Language Models for various medical applications. With samples collected from human conversations, this dataset contains a variety of options and suggestions to assist in creating useful language models. From prescribed medications to home remedies such as yoga exercises, breathing exercises, and natural remedies—this collection has it all! Only if you trust the language model you build with the right data can you use it to make decisions that matter in real life. This data is sure to give your project the boost it needs with legitimate information power-packed into every sample!
More Datasets
> For more datasets, click here.
Featured Notebooks
> - 🚨 Your notebook can be here! 🚨!
How to use the dataset
> > - Download the dataset. The dataset can be downloaded by clicking on the “Download” button located at the top of this page and following the prompts. > - Unzip and save the file in a location of your choice on your computer or device. > - Open up the ‘train’ or ‘test’ CSV file, depending on whether you would like to use it for training or testing purposes respectively. Both contain conversations and instructions utilizing medical terminologies which can be used to train a generative language model for medical applications. > - Read through each conversation/instruction that is provided in each row outlined in data frame column labeled 'Conversation'. These conversations provide examples of transaction between doctors, patients, pharmacists etc., discussing topics such as health advice, natural home remedies and prescriptions etc., as well as conversation involving diagnosis, symptoms, medication side effects and health concerns pertaining to certain medical conditions etc.. > - Note that all conversations are written according to varying levels of complexity with an emphasis on effectiveness when communicating within a healthcare environment eiher directly with patients or amongst colleagues discussing about cases via Verbal/written exchanges utilizing Medical terminologies). > > 6 Utilize natural language processing (NLP) techniques such as BERT Embeddings Or word embeddings corresponding to different domains Of medicine that might help relate And sort these conversations With regard To specific categories Of interest identified By domain experts For further Research purposes eiher Mathematically & statistically Or for wider Understanding contexts In diverse languages Such As Chinese , Spanish , Portuguese & French Etc
Research Ideas
> - Natural language processing applications such as automated medical transcription. > - Feature extraction and detection of health-related keywords for predictive analytics in healthcare applications. > - Automated diagnostics utilizing the language models trained on this dataset to identify diseases and illnesses based on user inputs, either through symptoms or other risk factors (e.g., age, lifestyle etc.)
Acknowledgements
> If you use this dataset in your research, please credit the original authors. > Data Source > >
License
> > > License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication > No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: train.csv
Column name | Description |
---|---|
Conversation | The conversation between two or more people or an instruction utilizing medical terminologies. (String) |
File: test.csv
Column name | Description |
---|---|
Conversation | The conversation between two or more people or an instruction utilizing medical terminologies. (String) |
Acknowledgements
> If you use this dataset in your research, please credit the original authors. > If you use this dataset in your research, please credit Huggingface Hub.
验证报告
以下为卖家选择提供的数据验证报告:
