Entropy (Jul 2025)
Cross-Domain Feature Enhancement-Based Password Guessing Method for Small Samples
Abstract
As a crucial component of account protection system evaluation and intrusion detection, the advancement of password guessing technology encounters challenges due to its reliance on password data. In password guessing research, there is a conflict between the traditional models’ need for large training samples and the limitations on accessing password data imposed by privacy protection regulations. Consequently, security researchers often struggle with the issue of having a very limited password set from which to guess. This paper introduces a small-sample password guessing technique that enhances cross-domain features. It analyzes the password set using probabilistic context-free grammar (PCFG) to create a list of password structure probabilities and a dictionary of password fragment probabilities, which are then used to generate a password set structure vector. The method calculates the cosine similarity between the small-sample password set B from the target area and publicly leaked password sets Ai using the structure vector, identifying the set Amax with the highest similarity. This set is then utilized as a training set, where the features of the small-sample password set are enhanced by modifying the structure vectors of the training set. The enhanced training set is subsequently employed for PCFG password generation. The paper uses hit rate as the evaluation metric, and Experiment I reveals that the similarity between B and Ai can be reliably measured when the size of B exceeds 150. Experiment II confirms the hypothesis that a higher similarity between Ai and B leads to a greater hit rate of Ai on the test set of B, with potential improvements of up to 32% compared to training with B alone. Experiment III demonstrates that after enhancing the features of Amax, the hit rate for the small-sample password set can increase by as much as 10.52% compared to previous results. This method offers a viable solution for small-sample password guessing without requiring prior knowledge.
Keywords