不合衬

verify-tagPre-processed Spanish-lang suicide tendency texts

mental healthdata cleaningnlpfeature engineeringtext

4

已售 0
33.08MB

数据标识:D17220519028594615

发布时间:2024/07/27

以下为卖家选择提供的数据验证报告:

数据描述

The dataset is used to analyze suicidal tendencies in texts, the original dataset is in English, messages extracted from different social networks such as twitter and reddit. the dataset was cleaned up by removing special characters, double spacing, stopwords and normalized with lemmatization

Content The dataset is a collection of posts from the "SuicideWatch" and "depression" subreddits of the Reddit platform. The posts are collected using Pushshift API. All posts that were made to "SuicideWatch" from Dec 16, 2008(creation) till Jan 2, 2021, were collected while "depression" posts were collected from Jan 1, 2009, to Jan 2, 2021. All posts collected from SuicideWatch are labeled as suicide, While posts collected from the depression subreddit are labeled as depression. Non-suicide posts are collected from r/teenagers.

Dataset original version https://www.kaggle.com/datasets/nikhileswarkomati/suicide-watch

data icon
Pre-processed Spanish-lang suicide tendency texts
4
已售 0
33.08MB
申请报告