Безопасность информационных технологий (Jun 2017)

One Approach to Solving Tokenization Problem for Analysis of Large-Scale Collections of User-Defined Passwords

  • Andrey N. Kuznetsov,
  • Dmitry A. Vyshemirsky

DOI
https://doi.org/10.26583/bit.2017.2.06
Journal volume & issue
Vol. 24, no. 2
pp. 50 – 60

Abstract

Read online

This paper performs an analysis of the algorithm of password tokenization introduced by R. Veras et al. We show main limitations of this approach and propose a new tokenization algorithm - RGramToken, based on frequency dictionaries of English words, bigrams and trigrams. Our approach allows better utilization of information about probabilitiy distribution of words and word combinations in a natural language. The results of comparison analysis of these two algorithms on specially prepared tests with warped phrases demonstrate higher efficiency of RGramToken and its robustness on low quality dictionaries.

Keywords