StackOverflow questions filtered 2009 - 2020

张若凡

StackOverflow questions filtered 2009 - 2020

businesscomputer scienceprogrammingdata cleaningnlptext

￥1

已售 0

69.41MB

数据标识：D17222529965911640

发布时间：2024/07/29

数据描述

Context

Dataset from the famous Stack Overflow site, exported thanks to Stack Exchange. These data are used within the framework of the processing of textual data to create a program of automatic generation of tags for the questions asked.

Content

This set of 13 CSV files includes the following variables:

Id: Unique identifier of the post
CreationDate: Creation date of the post
Title: Post title
Body: Complete question in HTML format
Tags: The tags used by users for the question
ViewCount: Number of views
CommentCount: Number of comments
AnswerCount: Number of answers
Score: Upvote score of the post.

The data was extracted using the following SQL query:

DECLARE @start_date DATE DECLARE @end_date DATE SET @start_date = '2011-01-01' SET @end_date = DATEADD(m , 12 , @start_date)  SELECT p.Id, p.CreationDate, p.Title, p.Body, p.Tags, p.ViewCount, p.CommentCount, p.AnswerCount, p.Score  FROM Posts as p LEFT JOIN PostTypes as t ON p.PostTypeId = t.id WHERE p.CreationDate between @start_date and @end_date AND t.Name = 'Question' AND p.ViewCount &gt; 20 AND p.CommentCount &gt; 5 AND p.AnswerCount &gt; 1 AND p.Score &gt; 5 AND len(p.Tags) &gt; 0

Inspiration

Data cleaning on textual data, automatic tag generator, NLP ...

验证报告

以下为卖家选择提供的数据验证报告：

StackOverflow questions filtered 2009 - 2020

￥1

已售 0

69.41MB

申请报告

StackOverflow questions filtered 2009 - 2020

Context

Content

Inspiration

关于典枢

下载与支持

服务协议

关于我们

官方公众号

技术交流群