以下为卖家选择提供的数据验证报告:
数据描述
Context
This dataset contains the preprocessed data of the original 20 Newsgroups dataset. The 20 newsgroups collection has become a popular data set for experiments in text applications of machine learning techniques, such as text classification and text clustering.
Content
This is the 20 Newsgroups dataset which has been preprocessed for machine learning tasks. The original 20 Newsgroups dataset contains 18828 files belonging to one of the 20 categories. The 20 categories/labels of these files are as follows: -rec.sport.hockey 999
- soc.religion.christian 997
- rec.motorcycles 994
- rec.sport.baseball 994
- sci.crypt 991
- sci.med 990
- rec.autos 990
- sci.space 987
- comp.os.ms-windows.misc 985
- comp.sys.ibm.pc.hardware 982
- sci.electronics 981
- comp.windows.x 980
- comp.graphics 973
- misc.forsale 972
- comp.sys.mac.hardware 961
- talk.politics.mideast 940
- talk.politics.guns 910
- alt.atheism 799
- talk.politics.misc 775
- talk.religion.misc 628 (number written along with each category shows the number of files/datapoints belonging to that category/label)
Acknowledgements
The source of original data files: http://qwone.com/~jason/20Newsgroups/

Twenty Newsgroups
43.21MB
申请报告