Faculty Mentor Information

Dr. Min Xian, University of Idaho

Presentation Date

7-2025

Abstract

As deep learning reshaping the future of image-based applications, I find myself repeatedly drawn to a single, unsettling question: just how trustworthy are these celebrated AI models when faced with real-world adversaries? Motivated by a concern for the reliability of AI-based cancer detection systems, my project dives directly into the mechanisms and consequences of hard-label black-box adversarial attacks methods that exploit the very blind spots most AI systems leave unguarded. I implement and benchmark three leading attacks (i.e., RayS, OPT, and SignOPT) across several state-of-the-art deep learning architectures such as, ResNet50, DenseNet121, and VGG16, using image datasets. Just as important, I challenge these attacks with advanced defenses: adversarial training, robust self-training, and multi-instance robust self-training. By measuring attack success rates, query efficiency, and the subtlety of perturbations, I aim to uncover not only which models withstand attack but why some defenses fail. My ongoing results demonstrate a critical approach to model robustness, with the goal of spotlighting the urgent gaps in current medical AI security and ultimately, guiding the design of safer, more trustworthy diagnostic tools. Through this work, I hope to push the conversation around medical AI from theoretical risk to practical resilience.

Comments

This work was supported by a University of Idaho Office of Undergraduate Research (OUR) Semester/SURF Award 2025.

Share

COinS
 

Lance and Shield: Benchmarking Adversarial Attacks and Defense of AI

As deep learning reshaping the future of image-based applications, I find myself repeatedly drawn to a single, unsettling question: just how trustworthy are these celebrated AI models when faced with real-world adversaries? Motivated by a concern for the reliability of AI-based cancer detection systems, my project dives directly into the mechanisms and consequences of hard-label black-box adversarial attacks methods that exploit the very blind spots most AI systems leave unguarded. I implement and benchmark three leading attacks (i.e., RayS, OPT, and SignOPT) across several state-of-the-art deep learning architectures such as, ResNet50, DenseNet121, and VGG16, using image datasets. Just as important, I challenge these attacks with advanced defenses: adversarial training, robust self-training, and multi-instance robust self-training. By measuring attack success rates, query efficiency, and the subtlety of perturbations, I aim to uncover not only which models withstand attack but why some defenses fail. My ongoing results demonstrate a critical approach to model robustness, with the goal of spotlighting the urgent gaps in current medical AI security and ultimately, guiding the design of safer, more trustworthy diagnostic tools. Through this work, I hope to push the conversation around medical AI from theoretical risk to practical resilience.

 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.