Faculty Mentor Information
Dr. Min Xian, University of Idaho
Presentation Date
7-2025
Abstract
As deep learning reshaping the future of image-based applications, I find myself repeatedly drawn to a single, unsettling question: just how trustworthy are these celebrated AI models when faced with real-world adversaries? Motivated by a concern for the reliability of AI-based cancer detection systems, my project dives directly into the mechanisms and consequences of hard-label black-box adversarial attacks methods that exploit the very blind spots most AI systems leave unguarded. I implement and benchmark three leading attacks (i.e., RayS, OPT, and SignOPT) across several state-of-the-art deep learning architectures such as, ResNet50, DenseNet121, and VGG16, using image datasets. Just as important, I challenge these attacks with advanced defenses: adversarial training, robust self-training, and multi-instance robust self-training. By measuring attack success rates, query efficiency, and the subtlety of perturbations, I aim to uncover not only which models withstand attack but why some defenses fail. My ongoing results demonstrate a critical approach to model robustness, with the goal of spotlighting the urgent gaps in current medical AI security and ultimately, guiding the design of safer, more trustworthy diagnostic tools. Through this work, I hope to push the conversation around medical AI from theoretical risk to practical resilience.
Lance and Shield: Benchmarking Adversarial Attacks and Defense of AI
As deep learning reshaping the future of image-based applications, I find myself repeatedly drawn to a single, unsettling question: just how trustworthy are these celebrated AI models when faced with real-world adversaries? Motivated by a concern for the reliability of AI-based cancer detection systems, my project dives directly into the mechanisms and consequences of hard-label black-box adversarial attacks methods that exploit the very blind spots most AI systems leave unguarded. I implement and benchmark three leading attacks (i.e., RayS, OPT, and SignOPT) across several state-of-the-art deep learning architectures such as, ResNet50, DenseNet121, and VGG16, using image datasets. Just as important, I challenge these attacks with advanced defenses: adversarial training, robust self-training, and multi-instance robust self-training. By measuring attack success rates, query efficiency, and the subtlety of perturbations, I aim to uncover not only which models withstand attack but why some defenses fail. My ongoing results demonstrate a critical approach to model robustness, with the goal of spotlighting the urgent gaps in current medical AI security and ultimately, guiding the design of safer, more trustworthy diagnostic tools. Through this work, I hope to push the conversation around medical AI from theoretical risk to practical resilience.
Comments
This work was supported by a University of Idaho Office of Undergraduate Research (OUR) Semester/SURF Award 2025.