智能科学与技术学报 (Sep 2024)
Research on method and architecture for defense assessment of artificial intelligence backdoors
Abstract
In response to the potential risk of backdoor attacks faced by artificial intelligence systems, a range of backdoor defense strategies are developed. The diversity of the evaluation criteria for existing defense method, makes cross-method comparisons a significant challenge. Hence, a unified evaluation framework base on artificial intelligence backdoors was proposed. This framework aimed to provide a common standard for evaluating different levels of defense strategies, including dataset-level and model-level defenses. Regarding the dataset-level defense strategies, the effectiveness of backdoor detection was primarily assessed through accuracy. Regarding the model-level defense strategies, focus was mainly placed on metrics such as attack success rate. By implementing unified evaluation framework, the performance of various backdoor defense methods under the same standards were compared and analyzed. This not only aids in identifying the strengths and weaknesses of each method, but also proposes targeted suggestions for improvements. The results indicate that unified evaluation framework can effectively measure the performance of different defense strategies, providing an important reference for further enhancing the security of artificial intelligence systems.