以下为卖家选择提供的数据验证报告:
数据描述
Context
AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. More information, can be found using this link http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html .
Content
These datasets consist of news article headlines. These headlines are labelled as either 0, 1, 2 and 3, these values correspond to 4 types of news topics which are 'World', 'Sports', 'Business' and 'Sci/Tech'.
Acknowledgements
I installed the AG's news topic classification training dataset which is available from the huggingface datasets library. The AG's news topic classification training dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the AG's corpus of news articles. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).
