Targeted Training Data Extraction—Neighborhood Comparison-Based Membership Inference Attacks in Large Language Models

Huan Xu; Zhanhao Zhang; Xiaodong Yu; Yingbo Wu; Zhiyong Zha; Bo Xu; Wenfeng Xu; Menglan Hu; Kai Peng

doi:10.3390/app14167118

Applied Sciences (Aug 2024)

Targeted Training Data Extraction—Neighborhood Comparison-Based Membership Inference Attacks in Large Language Models

Huan Xu,
Zhanhao Zhang,
Xiaodong Yu,
Yingbo Wu,
Zhiyong Zha,
Bo Xu,
Wenfeng Xu,
Menglan Hu,
Kai Peng

Affiliations

Huan Xu: State Grid Hubei Information & Telecommunication Company, Wuhan 430048, China
Zhanhao Zhang: Hubei Key Laboratory of Smart Internet Technology, School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
Xiaodong Yu: Hubei Huazhong Electric Power Technology Development Co., Ltd., Wuhan 430074, China
Yingbo Wu: Hubei Huazhong Electric Power Technology Development Co., Ltd., Wuhan 430074, China
Zhiyong Zha: State Grid Hubei Information & Telecommunication Company, Wuhan 430048, China
Bo Xu: Hubei Key Laboratory of Smart Internet Technology, School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
Wenfeng Xu: Hubei Huazhong Electric Power Technology Development Co., Ltd., Wuhan 430074, China
Menglan Hu: Hubei Key Laboratory of Smart Internet Technology, School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
Kai Peng: Hubei Key Laboratory of Smart Internet Technology, School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China

DOI: https://doi.org/10.3390/app14167118
Journal volume & issue: Vol. 14, no. 16
p. 7118

Abstract

Read online

A large language model refers to a deep learning model characterized by extensive parameters and pretraining on a large-scale corpus, utilized for processing natural language text and generating high-quality text output. The increasing deployment of large language models has brought significant attention to their associated privacy and security issues. Recent experiments have demonstrated that training data can be extracted from these models due to their memory effect. Initially, research on large language model training data extraction focused primarily on non-targeted methods. However, following the introduction of targeted training data extraction by Carlini et al., prefix-based extraction methods to generate suffixes have garnered considerable interest, although current extraction precision remains low. This paper focuses on the targeted extraction of training data, employing various methods to enhance the precision and speed of the extraction process. Building on the work of Yu et al., we conduct a comprehensive analysis of the impact of different suffix generation methods on the precision of suffix generation. Additionally, we examine the quality and diversity of text generated by various suffix generation strategies. The study also applies membership inference attacks based on neighborhood comparison to the extraction of training data in large language models, conducting thorough evaluations and comparisons. The effectiveness of membership inference attacks in extracting training data from large language models is assessed, and the performance of different membership inference attacks is compared. Hyperparameter tuning is performed on multiple parameters to enhance the extraction of training data. Experimental results indicate that the proposed method significantly improves extraction precision compared to previous approaches.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords