数据描述
Context
Seeing how the world is not as it used to be, my last years effort completely vanished and the impossibility of dreaming dropt me into an emotional hole. Going through this process forced me looking for new methods for knowing and finding myself in order to close this period in my life. This dataset comes from the necessity I had of knowing myself in the most analytical way possible. This dataset is composed by a mix of different Youtube's sources. Every single clicked video in the platform I use the most is there. This dataset is a mirror of my soul. My attention, my interest, my time.
Now, if you like it... You can know me, even deeper than I could.
Content
The base data has been gathered using the played videos list (Google Takeout). It was parsed using a selfmade Python script. The columns gathered there were:
- ('TITLE', 'VIDEO_URL', 'CHANNEL', 'CHANNEL_URL', 'SEEN_YEAR', 'SEEN_MONTH', 'SEEN_DAY', 'SEEN_TIME') -The code I wrote: https://github.com/yamil-emanuel/youtube-takeout-2-csv
The secondary dataset part was gathered using the Youtube's v3 API. The following data is related to every clicked video in the previous list.
- 'VIDEO_CATEGORY', 'VIDEO_TAGS', 'DEFAULT_LANGUAGE', 'DEFAULT_AUDIO_LANG', 'VIDEO_PUBLISHED_DATE', 'VIDEO_PUBLISHED_TIME', 'VIDEO_DESCRIPTION', 'VIDEO_DURATION', 'VIDEO_DEFINITION', 'VIDEO_RATING' -The code I wrote: https://github.com/yamil-emanuel/youtube-takeout-2-csv
The third and the last part of the dataset was created in other to a better understanding / analysis of the data. The title and description language where detected using an AI. Everytime there was an inconsistency between both values, I manually cleaned them (In version 0.04, the accuracy of these values is above 85%)
- 'TITLE_DETECTED_LANG', 'DESCRIPTION_DETECTED_LANG'. -The code I wrote: https://github.com/yamil-emanuel/youtube-takeout-2-csv
Inspiration
After knowing me as much (or even more) than I do. Wanna be friends?
验证报告
以下为卖家选择提供的数据验证报告:
