IEEE Access (Jan 2024)
Dynamics of Digital Pen-Tablet: Handwriting Analysis for Person Identification Using Machine and Deep Learning Techniques
Abstract
Handwriting is controlled by neurons in the brain’s nervous system, reflecting an individual’s personality and psychology. This unique characteristic can be used for various applications, including user authentication, assessment of neurodegenerative disorders, and classification of handedness, gender, and age groups. Traditional authentication systems require memorization, information leakage, and fingerprints, making them vulnerable to security breaches. The majority of researchers have studied the limitations of image quality, camera frames, and light effects on text and image-dependent performance. Therefore, this paper mainly focused on real-time, text-independent handwriting fine-motor data and proposed an efficient authentication system with low cost using efficient feature extraction and optimal feature selection approaches. This research utilizes two benchmark databases, including the handwriting data of 48 (24+24) participants collected via a sensor-based pen tablet. Each participant wrote the 10 words five times repeatedly, making it a total of 2400 samples. The handwriting classification of the different individuals is in 3 phases: feature extraction, feature selection, and classification. A total of 91 features (statistical, kinematic, spatial, and composite) were extracted from more accurate, real-time numerical handwriting data. The efficient and optimal features have been selected using four feature selection approaches, namely, Pearson’s r correlation, ANOVA-F, Mutual Information Gain, and PCA, among which the ANOVA-F test and PCA perform well for handwriting-extracted data. Then, 14 machine learning (ML) models and 7 deep learning (DL) models were applied to handle the problem of individual classification, with both no- and full-feature-selection scenarios considered. The experimental analysis has been conducted with different angles and perspectives, such as K-Fold cross-validation, testing system efficiency considering 5/10/15/24/48 individuals, and in the case of individual tasks. It shows that ML-based algorithms, namely, CATBOOST (99.07%) with ANOVA-F and DL-based models, namely, BiLSTM (98.31%) with PCA-selected features, provide the highest accuracy with dataset 2, among others that advocate the practicality and reliability of choosing this system for user identification.
Keywords