数据描述
Context
This dataset was created for the competition "Predict Student Performance from Game Play" which aims to predict student performance during game-based learning in real-time based on their game logs. The dataset's source raw data is available on the developers's site, which can be used as supplemental data. The idea for this dataset was discovered in this discussion.
Generating Script
To extract the data, I used my notebook.
Content
The dataset consists of two file types:
- Files with train data (_train suffix)
- Files with labels (_labels suffix) for each non-empty monthly dataset and its ID. There are 20 monthly datasets available on the mentioned site.
I tried to replicate the competition's data format as closely as possible, which involved:
- Creating only necessary columns
- Removing irrelevant data For example, navigate_hover events and quiz logs that are not present in the competition, were removed. However, if you find any inconsistencies in the dataset or in the generating script, please do share!
I also added save codes, so you can find out if players started from one of the saves. As I know in competition's dataset all players started from the beggining so you may like to ignore players, who use save codes.
Game Quits
One interesting aspect of the raw data is that it includes users who quit the game before it ended and may have stopped playing before completing a quiz. I only included users who passed at least the first quiz, which opens up possibilities to supplement data for the first level group, which has the least amount of features.
Implementing all the new logic with this dataset into pipelines may be difficult, and increasing train size may lead to memory errors. Additionally, some sessions are already present in the competition and must be ignored.
Motivation
I am sharing this dataset with the Kaggle community because I have university exams and do not have enough time to make the implementation myself. However, I believe that supplemental data with proper data cleaning techniques will greatly boost performance. Good luck!
验证报告
以下为卖家选择提供的数据验证报告:
