以下为卖家选择提供的数据验证报告:
数据描述
Data
Song Info
This repository has the code to collect information on the songs, i.e.
- Year,
- Rank,
- Song Name,
- Singer name, and the
- YouTube URL for Billboard Top 50 Singles Rankings from 1973-2022. The collected data is also shared here.
The create_top_50.py file
can be run to create the data. This will generate the CSV file, that can also be found at the top_50s_chart.csv file
. To load the csv file, make sure to specific the index_col argument as df = pd.read_csv("top_50s_chart.csv", index_col=[0,1])
Features Info
The features file contains the following features, for the top 2493 songs of the last 50 years (7 songs excluded):
- chroma_stft
- chroma_cens
- mfcc
- rmse
- zcr
- spectral_centroid
- spectral_bandwidth
- spectral_contrast
- spectral_rolloff for the songs colelcted above. The data is also shared here.
Running the entire file will also save all the audio files to your local device. The raw audio hasn't been shared as it contains a massive amount of data. However, the audio files have been analysed using librosa and the extracted features are shared at top_50_song_features.csv
. The code to extract these features is shared at get_features.py
.
Both of these files used in conjunction can provide immense oppotunities for exploration into trend analysis and genre classification.
