Faculty Mentor Information

Dr. Jyh-Haw Yeh, Boise State University
Md Mashrur Arifin, Boise State University

Presentation Date

7-2025

Abstract

The application of machine learning (ML) in malware detection has significantly advanced the threat identification capabilities against zero-day malware. With the advent of ML malware detectors came adversarial malware specifically designed to evade such detectors. As a result, new research has focused on quickly and efficiently creating adversarial malware that successfully evades existing models in order to be used for adversarial training: an approach for building robust detection models. Previous work has largely focused on programmatically modifying portable executable (PE) headers and sections to generate adversarial samples. While proven effective for evasion, many samples are left non-functional, and there is still a growing need to generate adversarial malware samples that utilize assembly code obfuscations to mirror real-world samples. In this work, we propose a novel framework that employs large language models (LLMs) to obfuscate the assembly code of malware samples from the SOREL-20M dataset. We aim to verify the functionality of such obfuscations by utilizing symbolic execution and then test the effectiveness of the obfuscated samples against an existing ML detection model. Within the feature space, our framework offers an improved method for generating advanced, functional adversarial malware.

Share

COinS
 

Generating Feature-Space Functional Adversarial Malware Using LLMs

The application of machine learning (ML) in malware detection has significantly advanced the threat identification capabilities against zero-day malware. With the advent of ML malware detectors came adversarial malware specifically designed to evade such detectors. As a result, new research has focused on quickly and efficiently creating adversarial malware that successfully evades existing models in order to be used for adversarial training: an approach for building robust detection models. Previous work has largely focused on programmatically modifying portable executable (PE) headers and sections to generate adversarial samples. While proven effective for evasion, many samples are left non-functional, and there is still a growing need to generate adversarial malware samples that utilize assembly code obfuscations to mirror real-world samples. In this work, we propose a novel framework that employs large language models (LLMs) to obfuscate the assembly code of malware samples from the SOREL-20M dataset. We aim to verify the functionality of such obfuscations by utilizing symbolic execution and then test the effectiveness of the obfuscated samples against an existing ML detection model. Within the feature space, our framework offers an improved method for generating advanced, functional adversarial malware.

 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.