Indexed by:
Abstract:
Backdoor attacks and adversarial attacks are two major security threats to deep neural networks (DNNs), with the former one is a training-time data poisoning attack that aims to implant backdoor triggers into models by injecting trigger patterns into training samples, and the latter one is a testing-time attack trying to generate adversarial examples (AEs) from benign images to mislead a well-trained model. While previous works generally treat these two attacks separately, the inherent connection between these two attacks is rarely explored. In this paper, we focus on bridging backdoor and adversarial attacks and observe two intriguing phenomena when applying adversarial attacks on an infected model implanted with backdoors: 1) the sample is harder to be turned into an AE when the trigger is presented; 2) the AEs generated from backdoor samples are highly likely to be predicted as its true labels. Inspired by these observations, we proposed a novel backdoor defense method, dubbed Adversarial-Inspired Backdoor Defense (AIBD), to isolate the backdoor samples by leveraging a progressive top-q scheme and break the correlation between backdoor samples and their target labels using adversarial labels. Through extensive experiments on various datasets against six state-of-the-art backdoor attacks, the AIBD-trained models on poisoned data demonstrate superior performance over the existing defense methods. Copyright © 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Keyword:
Reprint 's Address:
Email:
Source :
ISSN: 2159-5399
Year: 2025
Issue: 9
Volume: 39
Page: 9508-9516
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 1
Affiliated Colleges: