V2W-BERT: A Framework for Effective Hierarchical Multiclass Classification of Software Vulnerabilities

Document Type

Conference Proceeding

Publication Date



We consider the problem of automating the mapping of observed vulnerabilities in software listed in Common Vulnerabilities and Exposures (CVE) reports to weaknesses listed in Common Weakness Enumerations (CWE) reports, a hierarchically designed dictionary of software weaknesses. Mapping of CVEs to CWEs provides a means to understand how they might be exploited for malicious purposes, and to mitigate their impact. Since manual mapping of CVEs to CWEs is not a viable approach due to their ever-increasing sizes, automated approaches need to be devised but obtaining highly accurate mapping is a challenging problem. We present a novel Transformer-based learning framework (V2W-BERT) in this paper to solve this problem by bringing together ideas from natural language processing, link prediction and transfer learning. Our method outperforms previous approaches not only for CWE instances with abundant data to train, but also for rare CWE classes with little or no data. Using vulnerability and weakness reports from MITRE and the National Vulnerability Database, we achieve up to 97% prediction accuracy for randomly partitioned data and up to 94% prediction accuracy in temporally partitioned data. We demonstrate significant improvements in using historical data to predict weaknesses for future instances of CVEs. We believe that our work will would influence the design of better automated mapping approaches, and also that this technology could be deployed for more effective cybersecurity.