姜饼果子

verify-tagSteam Games Dataset

gamesvideo gamesdata cleaningdata visualizationdata analyticsrecommender systems

5

已售 0
64.76MB

数据标识:D17220596318134305

发布时间:2024/07/27

以下为卖家选择提供的数据验证报告:

数据描述

I have gathered this dataset by scraping the rolling page of the steam search site: https://store.steampowered.com/search/?category1=998&ndl=1&ignore_preferences=1 The data has been scraped in early September. The data is unorganized and needs cleaning.

If you want to see how I created recommendation system below you can read the description and ordering of notebooks. (check notebooks by owner)

Names of notebooks:

  • Preprocessing
  • EDA of preprocessed data
  • ML_Analysis_Main
  • ML_Analysis _improving
  • Recommendation System - Hybrid (the main target)
  • Recommendations system Item-based (just for example)

A small description

We started by scraping a video game platform steam. We used a rolling page in which the games were loaded endlessly while scrolling until games ran out. The order is less important. In the first scraping phase, we obtained Game data, namely name, price, discounted price, release date, and Link. Later, using the link, we expanded the data and extracted more information for each game. We merged the two datasets into a single file (which is uploaded here) and moved to the next step.

In the second stage, we did preprocessing before we started EDA directly. This could have happened during the EDA, but it was possible to carry out certain manipulations from the beginning. We fixed what we could easily see before EDA and then moved to EDA. This phase was very important because several features were removed and added, new ones were created etc.

EDA describes a lot of things about data that happened Cleaned, sorted, and visualized. Finally, data was prepared for analysis.

First in the analysis phase The Jupiter notebook I worked in is Analysis_Main, where I made the main machine-learning manipulations and finally got the game_recommendations CSV file. One cell in this file returns an error, which I intentionally left as is and indicated why.

The next file is Analysis – improving, where the game_recommendations created in the previous notebook was used and improved into the file game_recommendations2. (improved the model)

After that, the games were already grouped and we got the data which showed 10 recommendations per game. With this, the system can be considered somewhat complete, but I decided to use the user's data for testing purposes. I took the data from Kaggle, just for testing purposes (scraping steam by a customer was completely limited). This file is called steam-200k where there is data on users. Recommendation System - Hybrid is the file in which the main test is used. The file is hybrid, which means that a collaborative and content-based recommendation approach is used.

In this notebook, one cell saves only content-based file - user_recommendations_ContentBased.

After that I additionally made a Test_Collaborative_ItemBased notebook where only the collaborative approach is implemented, just for testing.

If the Data is used give credit to the owner.

data icon
Steam Games Dataset
5
已售 0
64.76MB
申请报告