困困

The White House Website

search enginesarts and entertainmentpoliticsreal estatedata cleaningtext miningtext

￥2

46.73MB

数据标识：D17222545506898406

发布时间：2024/07/29

Context

This dataset contains all pages of The White House website. Every page is in a row, and columns contain SEO elements; title tag, header tags, response headers, status code, meta description, etc.

In addition to the URLs, four special columns are extracted containing title, date, category, and text of briefings and presidential actions.

Content

Robots.txt file of The White House Sitemap.xml: all sitemaps Crawl files: each file contains the main SEO elements (title, headers, response, etc.) Briefings: all briefings were extracted into their own file containing the briefings date, category, title, and text

Acknowledgements

Python, Scrapy, advertools, pandas

Inspiration

The idea is to have a practice dataset for SEO crawls. How can you explore this dataset? What information can you extract about the content? Is the site in good shape for SEO? There is also a lot of textual information containing official statements and briefings from The White House.

看了又看

验证报告

以下为卖家选择提供的数据验证报告：

The White House Website

￥2

46.73MB

申请报告

The White House Website

Context

Content

Acknowledgements

Inspiration

关于典枢

下载与支持

服务协议

关于我们

官方公众号

技术交流群