以下为卖家选择提供的数据验证报告:
数据描述
Context
I was looking for an interesting dataset for a personal Data Science project, and I'm a fan of TED. So, I looked for the TED dataset, found Rounka's but it is incomplete and outdated. Then, I scraped myself and made it super fast using Parallel Programming. Now, it downloads all Metadata along with the Transcript in 300 seconds of all 4609 Talks on the website*. This is the most comprehensive TED Talk dataset which includes media files (images, audio, and video) too! *Scraped on 24-JUN-20. One can scrape entire TED.com using the code to get the latest dataset in 5 minutes.
Content
Each row corresponds to a Talk on TED.com and each column details Metadata (generic/speaker/talk related information) plus Transcript.
Acknowledgements
I thank Google for Colab.
Inspiration
We've got the entire TED.com in an Excel sheet, let's find some INSIGHTS WORTH SHARING!
