困困

verify-tagThe White House Website

search enginesarts and entertainmentpoliticsreal estatedata cleaningtext miningtext

2

已售 0
46.73MB

数据标识:D17222545506898406

发布时间:2024/07/29

以下为卖家选择提供的数据验证报告:

数据描述

Context

This dataset contains all pages of The White House website. Every page is in a row, and columns contain SEO elements; title tag, header tags, response headers, status code, meta description, etc.

In addition to the URLs, four special columns are extracted containing title, date, category, and text of briefings and presidential actions.

Content

Robots.txt file of The White House Sitemap.xml: all sitemaps Crawl files: each file contains the main SEO elements (title, headers, response, etc.) Briefings: all briefings were extracted into their own file containing the briefings date, category, title, and text

Acknowledgements

Python, Scrapy, advertools, pandas

Inspiration

The idea is to have a practice dataset for SEO crawls. How can you explore this dataset? What information can you extract about the content? Is the site in good shape for SEO? There is also a lot of textual information containing official statements and briefings from The White House.

data icon
The White House Website
2
已售 0
46.73MB
申请报告