淡然若水

verify-tagCVE and CWE mapping Dataset(2021)

softwaredata analyticsclassificationtextpublic safety

2

已售 0
31.08MB

数据标识:D17220837129270090

发布时间:2024/07/27

数据描述

Please give an Upvote if you feel that the data is Useful and Amazing. Your Upvotes will Motivate me to contribute more to this platform😊

Context

> A software vulnerability is a defect in software that could allow an attacker to gain control of a system. These defects can be because of the way the software is designed, or because of a flaw in the way that it’s coded.

An attacker can use a software flaw to steal or alter sensitive data, join a botnet, install a backdoor, or plant other types of malware by exploiting a software flaw. In addition, once an attacker has gained access to one network host, they can utilize that host to gain access to other hosts on the same network.

The National Vulnerability Archive (NVD) is the world's largest and most comprehensive database of publicly disclosed vulnerabilities in commercial and open source software. When an attacker discovers that your software is vulnerable to a known flaw, he or she will have a better understanding of what types of attacks to execute against it. If the attack is successful, the attacker will be able to execute malicious commands on the target system. Although the National Vulnerability Database and the Common Vulnerabilities and Exposures (CVE) list are commonly used interchangeably, there are notable discrepancies between the two databases despite their close association. The CVE dictionary was created by the non-profit MITRE Corporation in 1999, five years before the NVD.

CVE stands for Common Vulnerabilities and Exposures, and it's a standard reporting format for publicly known security flaws. CVE's primary goal is to standardize how a security vulnerability or risk is identified - with a number, a description, and at least one public reference. CVE is open to the public and is free to use. CVE-2020-16891 is an example of a CVE ID, which contains the CVE prefix, the year the CVE ID was assigned or the year the vulnerability was made public, and the sequence number digits. The CVE description comprises information such as the affected product's and vendor's names, a list of impacted versions, the vulnerability type, the impact, the access required for an attacker to exploit the issue, and the critical code components. The CVE reference includes the vulnerability reports, advisories, or sources that detail the vulnerability and the exploitation that could occur.

The distinction between CVE and CWE is that one addresses symptoms while the other addresses the root of the problem. The CVE is just a list of currently known flaws with specific systems and products, whereas the CWE categorizes categories of software vulnerabilities. The CWE is well-suited to identifying the most dangerous security flaws. The categories can assist you figure out what's causing your systems to fail and how to solve it. When it comes to which conditions make you the most vulnerable, it's usually the most common.

The Software Development representation groups flaws around ideas used or encountered frequently in software development, whereas the Hardware Design representation groups flaws around concepts used or encountered frequently in hardware design. Using several layers of abstraction, the Research Concept representation promotes research into different sorts of weaknesses and arranges items by actions. Each hierarchical representation is utilized to navigate the complete list based on your particular point of view.

Content

The dataset was collected from MITRE and **NVD ** organizations webpage.

  • Global_Dataset.xlsx : This dataset comprises all of the vulnerabilities reported in the NVD database between 2002 and 2021. The CVE ID, vulnerability description, CVSS scores, severity of the vulnerability, and corresponding CWE IDs are all included in the dataset.

  • synonym_mapping.json : It contains synonyms for some of the cyber security terms that were used in vulnerability descriptions reported by NVD, which can be found in the CWE Glossary. The words in the JSON file were the stemmed word from its original form. (It can be used as a additional preprocessing step to reduce the words domain)

Each CWE List View has corresponding CWE data, such as ID, Name, Description, Extended Description, and so on. It also has a hierarchical structure, with cwe_paths containing all of the different pathways from the root to the hierarchy's nodes. Vulnerability Dataset is a dataset of all vulnerabilities that corresponds to a CWE in the view.

What can be done with this?

The CVE-to-CWE classification is an active research area various research papers are published. The CVE-to-CWE mapping is an multi label node classification and Non-mandatory leaf node prediction problem were the CWE's in each view were aligned in a hierarchical directed acyclic graph. The Global_Dataset can be further used for various applications such as Data Analyzis, Data Visualisation, EDA, NLP projects, Clustering , etc.

Some Research Works on CVE Classification

  1. ThreatZoom: Hierarchical Neural Network for CVEs to CWEs Classification
  2. V2W-BERT: A Framework for Effective Hierarchical Multiclass Classification of Software Vulnerabilities
  3. A Study on the Classification of Common Vulnerabilities and Exposures using Naïve Bayes

验证报告

以下为卖家选择提供的数据验证报告:

data icon
CVE and CWE mapping Dataset(2021)
2
已售 0
31.08MB
申请报告