数据描述
About the data
As many people (me included) were having memory issues while using data from the M5 Forecasting - Accuracy competition, I've decided to preprocess them a few in order to try to solve some of these issues.
So, in the end, I've developed three datasets:
- processed_df: I got the merged data from Ryuhei's kernel and added a
date_id
column. That's all. This is not meant to be loaded on Kaggle kernels, as it needs more memory than the kernels can handle. I've uploaded it here for anyone who wants to use it for its personal datasets; - lstm_df: I've reshaped the processed_df in order to keep
date_id
,id
andvalue
columns data only. My intention is to use this dataset for LSTM and that's the reason I chose this name; - dimred_df: After applying PCA to the processed_df, I realized that, by using only the first two principal componentes, 99.9% of the explained variance was retained. By knowing that, this dataset contains only
date_id
,id
,value
and the two principal components columns.
Acknowledgements
Thanks Ryuhei F. for the amazing kernel from which I've developed these datasets.
验证报告
以下为卖家选择提供的数据验证报告:

M5 Accuracy - Preprocessed Datasets
1.06GB
申请报告