以下为卖家选择提供的数据验证报告:
数据描述
Mental Health Support Feature Analysis
Correlating Text Features and Mental Health Indicators
By [source]
About this dataset
> This dataset is an invaluable source of information for exploring the psychological and linguistic features of mental health support discussions conducted on Reddit in 2019. The data consists of text from posts extracted from a variety of subreddits, as well as over 256 features that may provide insight into the psychological and linguistic characteristics within these conversations. > > These included indicators measure such things as Automated Readability Index, Coleman-Liau Index, Flesch Reading Ease, Gunning Fog Index and Lix scores; Wiener Sachtextformel calculations; TF-IDF analyses related to key topics like abuse, alcohol use, anxiety, depression symptoms, family matters and more. Furthermore, values are also provided for metrics like words and syllables per sentence; total characters present in each post; total phrases or sentences contained per submission; numbers of long/monosyllable/polysyllable words used throughout each contribution. > > Sentiment analysis is another useful measurement made available within this dataset - values can be graphed against aspects such as negativity or neutrality versus positivity across all posts discussing various ideas related to economic stressors or isolation experiences - all alongside scores related to specific issues like substance use frequency or gun control debates. Additionally this dataset offers valuable metrics concerning punctuation tendencies encountered in these types of conversations - often associated with syntax brought forward by personal pronouns in the first person (I); second person (you) ; third person (him/her/they). Furthermore score information has been pulled around achievement language used; adverb presence detected throughout post histories etc., helping pave the way for detailed discourse analyses surrounding affective processes, anxieties mentioned within discussions on religious topics – even sadness levels expressed through discourse exchanges between people seeking mutual relationship advice! > > In addition to providing a wealth of measures produced from texts associated with all kinds mental health conversations found online – this dataset could prove extremely important when conducting further research designed to better profile certain populations emotionally impacted by their individual digital footprints!
More Datasets
> For more datasets, click here.
Featured Notebooks
> - 🚨 Your notebook can be here! 🚨!
How to use the dataset
> Using this dataset, you will be able to analyze various psychological and linguistic features in mental health support online discussions, in order to identify high-risk behaviors. The data consists of text from posts, as well as over 256 different indicators of psychological and linguistic features. > > To get started, you will need to set up your own local environment with the necessary packages for running the dataset. You can find this information on the Kaggle page for this dataset. Once you have all that set up, you'll be able to dive into exploring the available data! > > The first step is to take a look at each column header and gain an understanding of what each feature measures. This dataset contains features such as Automated Readability Index (ARI), Coleman-Liau Index (CLI), Flesch Reading Ease (FREEDOM), Gunning Fog Index (GFI), Lix, Wiener Sachtextformel, sentiment scores such as sentiment negative (SENT_NEG) and sentiment compound (SENT_COMPOUND). And textual features including TF-IDF analysis of words related to topics such as abuse, alcohol, anxiety depression family fearing medication problem stress suicide etc., are provided too in order for us use it accordingly with our purpose/project.. > > Using these features collected from mental health support discussions on Reddit between 2019 and 2020 on various topics related to mental health states such us abuse substance use economic issues social isolation etc., can help us better identify dangerous risk behaviours among those people who discussing their problems online Hence get a deeper understanding of online behaviorat risk state by studying certain patterns or trends beyond their text contents so intelligence agenciesetc could benefitfrom it when monitoring suspicious situations..from one side it provides them a unique toolkitfor identifying certain high-risk behaviorsfrom another side if provides many opportunitiesfor criminal justice authorities aimingto detect whenever someone discussing illegal activitiesonline like drug dealing weapons exchangeetc…so they would be readily catchit while digging deepinto analyzing this informationor post history from those usersdiscussing relevant topics... > > Furthermore when combiningthis particular tool kitwith others typesof analysessuch us speech pattern analysiscognitive behaviour analysistext emotion sentiment analysiswe should start noticingdangerous circumstances aroundus before anyone else does even beforethe person himself start noticingthose behavioral changes....All these aspects take partin our daily activityand surelyconsumers would receivebenefits from accessingsuch helpful informationsince just by reading poststhey could quickly recognize or identifys
Research Ideas
> - Detecting high-risk behaviors of online mental health discussions: Using the various TF-IDF scores, the dataset can be explored to identify potentially concerning language that might suggest higher risk behavior. > - Machine learning models to determine sentiment and emotion: This data can be used to train machine learning models that detect sentiment, emotion, and other psychological features in text messages and written words. > - Understanding language use in different subreddits: By plotting the psychological features over time, researchers can analyze word choice trends among different mental health support subreddits as well as understand how users have adapted language usage compared to previous years or localities
Acknowledgements
> If you use this dataset in your research, please credit the original authors. > Data Source > >
License
> > > License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication > No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: ptsd_2018_features_tfidf_256.csv
Column name | Description |
---|---|
subreddit | The subreddit the post was made in. (Categorical) |
author | The author of the post. (Categorical) |
date | The date the post was made. (Date) |
flesch_reading_ease | The Flesch Reading Ease score of the post. (Numerical) |
lix | The LIX score of the post. (Numerical) |
wiener_sachtextformel | The Wiener Sachtextformel score of the post. (Numerical) |
n_chars | The number of characters in the post. (Numerical) |
n_long_words | The number of long words in the post. (Numerical) |
n_monosyllable_words | The number of monosyllable words in the post. (Numerical) |
n_polysyllable_words | The number of polysyllable words in the post. (Numerical) |
n_sents | The number of sentences in the post. (Numerical) |
n_syllables | The number of syllables in the post. (Numerical) |
n_unique_words | The number of unique words in the post. (Numerical) |
n_words | The number of words in the post. (Numerical) |
sent_neg | The sentiment score of negative words in the post. (Numerical) |
sent_neu | The sentiment score of neutral words in the post. (Numerical) |
sent_pos | The sentiment score of positive words in the post. (Numerical) |
sent_compound | The sentiment score of the post. (Numerical) |
economic_stress_total | The economic stress score of the post. (Numerical) |
isolation_total | The isolation score of the post. (Numerical) |
substance_use_total | The substance use score of the post. (Numerical) |
guns_total | The gun score of the post. (Numerical) |
domestic_stress_total | The domestic stress score of the post. (Numerical) |
File: autism_post_features_tfidf_256.csv
Column name | Description |
---|---|
subreddit | The subreddit the post was made in. (Categorical) |
author | The author of the post. (Categorical) |
date | The date the post was made. (Date) |
flesch_reading_ease | The Flesch Reading Ease score of the post. (Numerical) |
lix | The LIX score of the post. (Numerical) |
wiener_sachtextformel | The Wiener Sachtextformel score of the post. (Numerical) |
n_chars | The number of characters in the post. (Numerical) |
n_long_words | The number of long words in the post. (Numerical) |
n_monosyllable_words | The number of monosyllable words in the post. (Numerical) |
n_polysyllable_words | The number of polysyllable words in the post. (Numerical) |
n_sents | The number of sentences in the post. (Numerical) |
n_syllables | The number of syllables in the post. (Numerical) |
n_unique_words | The number of unique words in the post. (Numerical) |
n_words | The number of words in the post. (Numerical) |
sent_neg | The sentiment score of negative words in the post. (Numerical) |
sent_neu | The sentiment score of neutral words in the post. (Numerical) |
sent_pos | The sentiment score of positive words in the post. (Numerical) |
sent_compound | The sentiment score of the post. (Numerical) |
economic_stress_total | The economic stress score of the post. (Numerical) |
isolation_total | The isolation score of the post. (Numerical) |
substance_use_total | The substance use score of the post. (Numerical) |
guns_total | The gun score of the post. (Numerical) |
domestic_stress_total | The domestic stress score of the post. (Numerical) |
Acknowledgements
> If you use this dataset in your research, please credit the original authors. > If you use this dataset in your research, please credit .
