MBAL: 10 millions crypto address label dataset

老下头

MBAL: 10 millions crypto address label dataset

data cleaningclassificationensemblingbigquerycurrencies and foreign exchangeenglish

￥15

已售 0

947.53MB

数据标识：D17174993458318894

发布时间：2024/06/04

This dataset is published in the article "MBAL: A Dataset of 10 Million Annotated Crypto Addresses with Categories and Entities on Leading Blockchain Networks" and includes data related to the dataset and experiments conducted.

The dataset comprises six files, covering three sections, described as follows:

Section 1: The publicly released dataset

dataset_10m_ads.csv This file contains labeled data for 10 million addresses, with six columns explained below:

column_name	description
chain	The blockchain network of the address, with five possible values: bitcoin_mainnet, ethereum_mainnet, bnb_chain_mainnet, polygon_mainnet, avalanche_c_chain
address	The cryptocurrency address
categories	The category of the address, as enumerated in the article, with 62 possible values. An address may belong to multiple categories
entity	The entity associated with the address, which may be unique or empty
source	The source of the data, with three possible values: ground_truth, heuristic, external

Second 2: Sample data for Experiment 1 (COMPARATIVE EXPERIMENT BETWEEN MBAL AND BABD-13)

Experiment 1 focuses on addresses in Bitcoin mainnet. The columns in the below three files are consistent, mainly including address, category, and other 144 feature fields. Using these sample data, Experiment 1 described in the article can be fully replicated.

The method of constructing a training/test set based on sample data is shown in this figure . We fused and de-weighted the positive sample data of the two datasets, from which 50,000 data were randomly selected as the positive sample of the test set. Negative samples are constructed in the same way, and a test set of 100,000 data is finally obtained. And the sample data removal corresponding to the test set is the training set data. The white part in this figure indicates the duplicate data, yellow indicates the test data, and light yellow indicates the training data.

exp1_bitcoin_sample_test_dd.csv Public test samples for Experiment 1.
exp1_bitcoin_sample_train_mbal_dd.csv Training samples from the MBAL dataset for Experiment 1.
exp1_bitcoin_sample_train_babd_dd.csv Training samples from the BABD dataset for Experiment 1.

Section 3: Sample data for Experiment 2 (EXPERIMENT ON SPECIFIC CATEGORIES)

Experiment 2 focuses on addresses in Ethereum mainnet. The columns in these files are consistent, mainly including address, category, and other 207 feature fields. Using these sample data, Experiment 2 described in the article can be fully replicated. Sample Dataset Construction: When analyzing the Ethereum category, we select the transaction data from 2022 for sample dataset construction, constructed in the same way as the experiment 1. In total, we got 55571103 addresses and the corresponding 591892912 transaction data. Training/Test Set Construction: We constructed training/test sets for specific categories of analysis experiments using the same methodology as for the experiment 1. However, in terms of quantity, we expanded by selecting 4749952 training data and 1000000 test data (500000 positive and negative samples, respectively).

exp2_ethereum_sample_test_mbal_dd.csv Test samples from the MBAL dataset for Experiment 2.
exp2_ethereum_sample_train_mbal_dd.csv Training samples from the MBAL dataset for Experiment 2.

About categories

看了又看

验证报告

以下为卖家选择提供的数据验证报告：

MBAL: 10 millions crypto address label dataset

￥15

已售 0

947.53MB

申请报告

MBAL: 10 millions crypto address label dataset

Section 1: The publicly released dataset

Second 2: Sample data for Experiment 1 (COMPARATIVE EXPERIMENT BETWEEN MBAL AND BABD-13)

Section 3: Sample data for Experiment 2 (EXPERIMENT ON SPECIFIC CATEGORIES)

About categories

关于典枢

下载与支持

服务协议

关于我们

官方公众号

技术交流群