以下为卖家选择提供的数据验证报告:
数据描述
The dataset is used to analyze suicidal tendencies in texts, the original dataset is in English, messages extracted from different social networks such as twitter and reddit. the dataset was cleaned up by removing special characters, double spacing, stopwords and normalized with lemmatization
Content The dataset is a collection of posts from the "SuicideWatch" and "depression" subreddits of the Reddit platform. The posts are collected using Pushshift API. All posts that were made to "SuicideWatch" from Dec 16, 2008(creation) till Jan 2, 2021, were collected while "depression" posts were collected from Jan 1, 2009, to Jan 2, 2021. All posts collected from SuicideWatch are labeled as suicide, While posts collected from the depression subreddit are labeled as depression. Non-suicide posts are collected from r/teenagers.
Dataset original version https://www.kaggle.com/datasets/nikhileswarkomati/suicide-watch
