老下头

verify-tagYoutube Transcripts[Hindi+English]

text classificationtext-to-text generationsummarizationtranslationhindi

30

已售 0
855.69MB

数据标识:D17175278527395086

发布时间:2024/06/05

数据描述

Context

The dataset contains the Hindi and English subtitles for famous YouTube channels. This dataset was mainly created for the Hindi Language channel since the main goal was to use this dataset to build LLMs using the Hindi Language.

Data from channels in Information, Entertainment, Politics, Comedy, News, etc categories has been included in this dataset.

Dataset Stats:

  • 85 channels
  • 168,039 total videos

Content

  • Video subtitles in Hindi and English
  • Video metadata like duration, number of comments, likes, counts, published date

Acknowledgements

The source of this dataset is YouTube. The following packages were used to generate this dataset:

Inspiration

  • Build LLMs model in Hindi Language
  • Finetune models in Hindi Language for tasks like classification, summarization, translation, etc

验证报告

以下为卖家选择提供的数据验证报告:

data icon
Youtube Transcripts[Hindi+English]
30
已售 0
855.69MB
申请报告