Faculty Mentor Information
Dr. Jyh-Haw Yeh, Boise State University
Md Mashrur Arifin, Boise State University
Presentation Date
7-2025
Abstract
The application of machine learning (ML) in malware detection has significantly advanced the threat identification capabilities against zero-day malware. With the advent of ML malware detectors came adversarial malware specifically designed to evade such detectors. As a result, new research has focused on quickly and efficiently creating adversarial malware that successfully evades existing models in order to be used for adversarial training: an approach for building robust detection models. Previous work has largely focused on programmatically modifying portable executable (PE) headers and sections to generate adversarial samples. While proven effective for evasion, many samples are left non-functional, and there is still a growing need to generate adversarial malware samples that utilize assembly code obfuscations to mirror real-world samples. In this work, we propose a novel framework that employs large language models (LLMs) to obfuscate the assembly code of malware samples from the SOREL-20M dataset. We aim to verify the functionality of such obfuscations by utilizing symbolic execution and then test the effectiveness of the obfuscated samples against an existing ML detection model. Within the feature space, our framework offers an improved method for generating advanced, functional adversarial malware.
Generating Feature-Space Functional Adversarial Malware Using LLMs
The application of machine learning (ML) in malware detection has significantly advanced the threat identification capabilities against zero-day malware. With the advent of ML malware detectors came adversarial malware specifically designed to evade such detectors. As a result, new research has focused on quickly and efficiently creating adversarial malware that successfully evades existing models in order to be used for adversarial training: an approach for building robust detection models. Previous work has largely focused on programmatically modifying portable executable (PE) headers and sections to generate adversarial samples. While proven effective for evasion, many samples are left non-functional, and there is still a growing need to generate adversarial malware samples that utilize assembly code obfuscations to mirror real-world samples. In this work, we propose a novel framework that employs large language models (LLMs) to obfuscate the assembly code of malware samples from the SOREL-20M dataset. We aim to verify the functionality of such obfuscations by utilizing symbolic execution and then test the effectiveness of the obfuscated samples against an existing ML detection model. Within the feature space, our framework offers an improved method for generating advanced, functional adversarial malware.