以下为卖家选择提供的数据验证报告:
数据描述
We present the Spotify Music dataset with the goal of enabling researchers and practitioners from research to mitigate the noisy feedback and self-selection biases inherent in the data collected by existing music platforms. These biases are likely to have significant impact on the fairness, transparency and quality of recommendation systems. Much has already been written about recommendation algorithms and evaluation metrics and we hope this dataset helps the community to focus on the impact of the data collection mechanisms.
Noisy feedback biases arise in implicit data sets collected by streaming apps. Such apps collect user actions without recording the context of the user and without the knowledge that they are being surveyed. Consequently, a song stream from a recommended playlist may be falsely interpreted as an indication that the song was enjoyed, when in fact it was played in the background. A skipped song may be falsely interpreted as an indication that the song was disliked, when in fact the user may not be in the mood for the song in their present context. Self-selection biases arise in explicit data sets collected by apps that ask users to give ratings. Since rating is optional, the users most incentivized to rate are users who are very happy or very unhappy about their experience with the rated item.
The Spotify Music dataset currently consists of 6696 anonymized users, 181,663 anonymized songs and 1,017,947 binary ratings and the data collection is still on-going.
