294

Amazon-M2

Business

15

已售 0
398.21MB

数据标识:D17168838760173100

发布时间:2024/05/28

数据描述

About Dataset

🗃️ Dataset

The dataset released is anonymized and not representative of the production characteristics.

The Multilingual Shopping Session Dataset is a collection of anonymized customer sessions containing products from six different locales: English, German, Japanese, French, Italian, and Spanish. It consists of two main components: user sessions and product attributes. User sessions are a list of products that a user has engaged with in chronological order, while product attributes include various details like product title, price in local currency, brand, colour, and description.

The dataset has been divided into three splits: train, phase-1 test, and phase-2 test. For Task 1 and Task 2, the proportions for each language are roughly 10:1:1. For Task 3, the number of samples in the phase-1 test and phase-2 test is fixed at 10,000. All three tasks share the same train set, while their test sets have been constructed according to their specific objectives. Task 1 uses English, German, and Japanese data, while Task 2 uses French, Italian, and Spanish data. Participants in Task 2 are encouraged to use transfer learning to improve their system's performance on the test set. For Task 3, the test set includes products that do not appear in the training set, and participants are asked to generate the title of the next product based on the user session.

Table 1 summarizes the dataset statistics, including the number of sessions, interactions, products, and average session length. The dataset will be made publicly available as part of the KDD Cup competition. Each product will be identified by a unique Amazon Standard Identification Number (ASIN), making extracting more information from the web easy. Participants are free to use external sources of information to train their systems, such as public datasets and pre-trained language models, but must declare them when describing their systems beyond the provided dataset.

Language (Locale) # Sessions # Products (ASINs)
German (DE) 1111416 513811
Japanese (JP) 979119 389888
English (UK) 1182181 494409
Spanish (ES) 89047 41341
French (FR) 117561 43033
Italian (IT) 126925 48788

Table 1: Dataset statistics

In addition, we list the column names and their meanings for product attribute data:

  • locale: the locale code of the product (e.g., DE)
  • id: a unique for the product. Also known as Amazon Standard Item Number (ASIN) (e.g., B07WSY3MG8)
  • title: title of the item (e.g., “Japanese Aesthetic Sakura Flowers Vaporwave Soft Grunge Gift T-Shirt”)
  • price: price of the item in local currency (e.g., 24.99)
  • brand: item brand name (e.g., “Japanese Aesthetic Flowers & Vaporwave Clothing”)
  • color: color of the item (e.g., “Black”)
  • size: size of the item (e.g., “xxl”)
  • model: model of the item (e.g., “iphone 13”)
  • material: material of the item (e.g., “cotton”)
  • author: author of the item (e.g., “J. K. Rowling”)
  • desc: description about a item’s key features and benefits called out via bullet points (e.g., “Solid colors: 100% Cotton; Heather Grey: 90% Cotton, 10% Polyester; All Other Heathers …”)

验证报告

卖家暂未授权典枢平台对该文件进行数据验证,您可以向卖家

申请验证报告

data icon
Amazon-M2
15
已售 0
398.21MB
申请报告