老下头

verify-tagMKPHOTO2023

russiatabularimagesocial networks

7

已售 0
736.16MB

数据标识:D17175015371982725

发布时间:2024/06/04

以下为卖家选择提供的数据验证报告:

数据描述

The MKPHOTO2023 dataset is designed to research the types of profile photos used by different types of malicious social bots. This dataset includes photos employed by VKontakte bots, classification of photo types and bots' metrics. For this classification, we utilized various detectors:

  1. YOLO (GitHub) - to identify a person
  2. CelebDetector (GitHub) - to identify face and celebrity
  3. GAN-image-detection (GitHub) - to identify GAN usage.
  4. DTM-image-detection (GitHub) - to identify Diffusion and Transformers models (DTM) usage.

GAN and DTM images were manually reviewed to clean up any misclassifications.

To collect bots and measure their metrics, we created 'honeypots' (fake victims) in VK and bought bot activity. During bot purchase, we measured bot properties (e.g. speed of action, price, etc.).

More details related to the photo analysis process: preprint

More details related to bot purchase and bots' metrics measurements process: paper, preprint


The dataset consists of the following files:

  1. photos.zip - is the archive with .JPG photos of bots' faces. Each photo has an id as the name of the file.
  2. dataset.csv - is a result of photo analysis where we aggregated outputs of various detectors (see files below) and added bots' metrics from MKMETRIC2022 dataset.

Additional files below are the raw output of detectors.

  1. celebs_and_faces.csv - is the raw output of CelebDetector.
  2. face_labels.csv - is the raw output of YOLO detector.
  3. gan.csv - is the raw output of GAN detector.
  4. dtm.csv - is the raw output of DTM detector.
data icon
MKPHOTO2023
7
已售 0
736.16MB
申请报告