张若凡

verify-tagStackOverflow questions filtered 2009 - 2020

businesscomputer scienceprogrammingdata cleaningnlptext

1

已售 0
69.41MB

数据标识:D17222529965911640

发布时间:2024/07/29

以下为卖家选择提供的数据验证报告:

数据描述

Context

Dataset from the famous Stack Overflow site, exported thanks to Stack Exchange. These data are used within the framework of the processing of textual data to create a program of automatic generation of tags for the questions asked.

Content

This set of 13 CSV files includes the following variables:

  • Id: Unique identifier of the post
  • CreationDate: Creation date of the post
  • Title: Post title
  • Body: Complete question in HTML format
  • Tags: The tags used by users for the question
  • ViewCount: Number of views
  • CommentCount: Number of comments
  • AnswerCount: Number of answers
  • Score: Upvote score of the post.

The data was extracted using the following SQL query:

DECLARE @start_date DATE DECLARE @end_date DATE SET @start_date = '2011-01-01' SET @end_date = DATEADD(m , 12 , @start_date)  SELECT p.Id, p.CreationDate, p.Title, p.Body, p.Tags, p.ViewCount, p.CommentCount, p.AnswerCount, p.Score  FROM Posts as p LEFT JOIN PostTypes as t ON p.PostTypeId = t.id WHERE p.CreationDate between @start_date and @end_date AND t.Name = 'Question' AND p.ViewCount > 20 AND p.CommentCount > 5 AND p.AnswerCount > 1 AND p.Score > 5 AND len(p.Tags) > 0 

Inspiration

Data cleaning on textual data, automatic tag generator, NLP ...

data icon
StackOverflow questions filtered 2009 - 2020
1
已售 0
69.41MB
申请报告