老下头

verify-tagMQTTset

artificial intelligencecomputer scienceprogrammingdata analyticsdeep learningneural networks

12

已售 0
879.43MB

数据标识:D17175026847795303

发布时间:2024/06/04

数据描述

MQTTset, a new dataset for MQTT

The proposed work aims to create a dataset linked to the IoT context, in particular on the MQTT communication protocol, in order to give to the research and industrial community an initial dataset to use in their application. The dataset is composed by IoT sensors based on MQTT where each aspect of a real network is defined. In particular, the MQTT broker is instantiated by using Eclipse Mosquitto and the network is composed by 8 sensors. The scenario is related to a smart home environment where sensors retrieve information about temperature, light, humidity, CO-Gas, motion, smoke, door and fan with different time interval since the behaviour of each sensor is different with the others.

Getting Started

In order to user the dataset, simply download the github repository and start to work with the csv or PCAP file. More information are available at the following page: Vaccari, I.; Chiola, G.; Aiello, M.; Mongelli, M.; Cambiaso, E. MQTTset, a New Dataset for Machine Learning Techniques on MQTT. Sensors 2020, 20, 6578

Please if you use this dataset in a research work, please cite this article.

MQTT network structure

As mentioned, the dataset isc composed by 8 MQTT sensors with different features. In table, the MQTT sensors are reported. Each sensor is associated with a data profile and a topic linked to the MQTT broker. The data profile consists of the type of data that the sensors communicate while the topic is defined by the sensor when sending the data to the broker. Finally, the sensors were conceptually divided into two rooms as if they were distributed in a smart house and the MQTT broker has 10.16.100.73 as IP address with 1883 as clear text communication port. In the table, the time could be periodic o random. This concept is important since a temperature sensor has a periodic behavior over time, i.e. cyclically sending information retrieved from the environment periodically (defined as P). Instead, a motion sensor has a more random behavior since it sends information only when a user passes in front of the sensor (defined as R)). By analyzing also this aspect, the dataset is even more valid as a real behavior of a home automation is simulated and implemented.

Sensor IP address Room Time (P:periodic, R:random) Topic Data Profile
Temperature 192.168.0.151 1 P, 60 s Temperature Temperature
Light intensity 192.168.0.150 1 P, 1800 s Light intensity Light intensity
Humidity 192.168.0.152 1 P, 60 s Humidity Humidity
Motion sensor 192.168.0.154 1 R, 1 h Motion sensor Motion sensor
CO-Gas 192.168.0.155 1 R, 1 h s CO-Gas CO-Gas
Smoke 192.168.0.180 2 R, 1 h Smoke Smoke
Fan speed controller 192.168.0.173 2 P, 120 s Fan speed controller Fan speed controller
Door lock 192.168.0.176 2 R, 1 h Door lock Door lock
Fan sensor 192.168.0.178 2 P, 60 s Fan sensor Fan sensor
Motion sensor 192.168.0.174 2 R, 1 h Motion sensor Motion sensor

Github repository organization

The repository is composed by 3 folder:

  • PCAP raw data
    • Legitimate
    • SlowITe
    • Bruteforce
    • Malformed data
    • Flooding
    • DoS attack
  • CSV file
    • Legitimate
    • SlowITe
    • Bruteforce
    • Malformed data
    • Flooding
    • DoS attack
  • Final dataset
    • train70.csv, test30.csv
    • train70_reduced.csv, test30_reduced.csv
    • train70_augmented.csv, test30_augmented.csv

In the PCAP folder, there are the raw network data recovered directly from the sensors of the MQTT network and also the traffic related to the attacks. In the CSV folder instead, there are the data and features extracted from the PCAP file using the tshark tool. Finally, the FINAL_CSV folder contains the CSV files combined with each other and subsequently used for machine learning algorithms. In particular, CSV files are present in 3 different formats:

  • train70.csv, test30.csv: in these files, the legitimate traffic was randomly combined with the different malicious traffic.
  • train70_reduced.csv, test30_reduced.csv: the reduced form combines malicious traffic with legitimate traffic in the 50:50 form, so there will be less legitimate traffic than actually. The legitimate traffic will be equal to the sum of the malicious traffic
  • train70_augmented.csv, test30_augmented.csv: in the augmented form, however, the malicious traffic has been increased so that the sum of the traffic related to the attacks is equal to the legitimate traffic.

Installation

In order to use the dataset combined with the proposed code, run the following command to install python3 libraries:

pip3 install -r requirements.txt 

Built With

  • IoT-Flock - Framework to generate IoT networks
  • MQTTSA - A security assessment tool for MQTT networks

Authors

  • Ivan Vaccari - Concept, implementation, elaboration, paper writer - Profile
  • Giovanni Chiola - Dataset approach and definitiom - Profile
  • Maurizio Aiello - Supervisor, paper reviewer - Profile
  • Maurizio Mongelli - Machine learning support and contribution - Profile
  • Enrico Cambiaso - Supervisor, elaboration, paper collaboration - Profile

验证报告

以下为卖家选择提供的数据验证报告:

data icon
MQTTset
12
已售 0
879.43MB
申请报告