丹丹¹💗(助力)

verify-tagTwenty Newsgroups

earth and naturelinguisticsinternetsoftwarenlp

6

已售 0
43.21MB

数据标识:D17220626761839387

发布时间:2024/07/27

以下为卖家选择提供的数据验证报告:

数据描述

Context

This dataset contains the preprocessed data of the original 20 Newsgroups dataset. The 20 newsgroups collection has become a popular data set for experiments in text applications of machine learning techniques, such as text classification and text clustering.

Content

This is the 20 Newsgroups dataset which has been preprocessed for machine learning tasks. The original 20 Newsgroups dataset contains 18828 files belonging to one of the 20 categories. The 20 categories/labels of these files are as follows: -rec.sport.hockey 999

  • soc.religion.christian 997
  • rec.motorcycles 994
  • rec.sport.baseball 994
  • sci.crypt 991
  • sci.med 990
  • rec.autos 990
  • sci.space 987
  • comp.os.ms-windows.misc 985
  • comp.sys.ibm.pc.hardware 982
  • sci.electronics 981
  • comp.windows.x 980
  • comp.graphics 973
  • misc.forsale 972
  • comp.sys.mac.hardware 961
  • talk.politics.mideast 940
  • talk.politics.guns 910
  • alt.atheism 799
  • talk.politics.misc 775
  • talk.religion.misc 628 (number written along with each category shows the number of files/datapoints belonging to that category/label)

Acknowledgements

The source of original data files: http://qwone.com/~jason/20Newsgroups/

data icon
Twenty Newsgroups
6
已售 0
43.21MB
申请报告