老下头

verify-tagM5 Accuracy - Preprocessed Datasets

pcalstm

15

已售 0
1.06GB

数据标识:D17173861802545145

发布时间:2024/06/03

数据描述

About the data

As many people (me included) were having memory issues while using data from the M5 Forecasting - Accuracy competition, I've decided to preprocess them a few in order to try to solve some of these issues.

So, in the end, I've developed three datasets:

  • processed_df: I got the merged data from Ryuhei's kernel and added a date_id column. That's all. This is not meant to be loaded on Kaggle kernels, as it needs more memory than the kernels can handle. I've uploaded it here for anyone who wants to use it for its personal datasets;
  • lstm_df: I've reshaped the processed_df in order to keep date_id, id and value columns data only. My intention is to use this dataset for LSTM and that's the reason I chose this name;
  • dimred_df: After applying PCA to the processed_df, I realized that, by using only the first two principal componentes, 99.9% of the explained variance was retained. By knowing that, this dataset contains only date_id, id, value and the two principal components columns.

Acknowledgements

Thanks Ryuhei F. for the amazing kernel from which I've developed these datasets.

验证报告

以下为卖家选择提供的数据验证报告:

data icon
M5 Accuracy - Preprocessed Datasets
15
已售 0
1.06GB
申请报告