DS数据代找

writingqualitymemoryreduction

LinguisticsRegressionData CleaningTabular

20

已售 0
74MB

数据标识:D17168924748077832

发布时间:2024/05/28

卖家暂未授权典枢平台对该文件进行数据验证,您可以向卖家

申请验证报告

数据描述

About Dataset

This is a memory reduced dataset for the Writing Process-Writing Quality competition. I encoded text columns into np.int8 type and binned categories with extremely low occurrences into a common bin. I also down-casted certain columns in the data based on their min-max values to save memory. I have saved the train-logs data in a binary format and the encoded text strings and their categories too as one may need them while inferring on the test data.
This is also available in my baseline data prep kernel.
We will use this data as input for all our future steps including EDA, model development and inference development. We hope not to fall prey to memory errors using such an approach.
All the best for the competition!

data icon
writingqualitymemoryreduction
20
已售 0
74MB
申请报告