以下为卖家选择提供的数据验证报告:
数据描述
Context This dataset contains around 38000 lines of articles from CNN news from the year 2011 to 2022. The data were collected using a web crawler. The crawler can be found on my GitHub account (https://github.com/hadasu/CNN_web_crawler ). The crawler scans through the CNN site and extracts various parameters from the HTML of the article. You can edit the crawler to extract as much or as little data as you want.
Content This dataset contains Author, Publication date, Category, Article Section, Url source, Headline, Description, Full test, and more... For raw data, you can refer to CNN Articles, Raw data
Inspiration
- Categories based on Category, Article Section, Headline, Description, and more...
- Fulltext classification
- Natural language processing (NLP)

CNN News Articles from 2011 to 2022
87.42MB
申请报告