Auxiliary Cross-Modal Representation Learning With Triplet Loss Functions for Online Handwriting Recognition

Felix Ott; David Rugamer; Lucas Heublein; Bernd Bischl; Christopher Mutschler

doi:10.1109/ACCESS.2023.3310819

IEEE Access (Jan 2023)

Auxiliary Cross-Modal Representation Learning With Triplet Loss Functions for Online Handwriting Recognition

Felix Ott,
David Rugamer,
Lucas Heublein,
Bernd Bischl,
Christopher Mutschler

Affiliations

Felix Ott: ORCiD; Fraunhofer IIS, Fraunhofer Institute for Integrated Circuits, Nuremberg, Germany
David Rugamer: ORCiD; LMU Munich, Munich, Germany
Lucas Heublein: ORCiD; Fraunhofer IIS, Fraunhofer Institute for Integrated Circuits, Nuremberg, Germany
Bernd Bischl: ORCiD; LMU Munich, Munich, Germany
Christopher Mutschler: ORCiD; Fraunhofer IIS, Fraunhofer Institute for Integrated Circuits, Nuremberg, Germany

DOI: https://doi.org/10.1109/ACCESS.2023.3310819
Journal volume & issue: Vol. 11
pp. 94148 – 94172

Abstract

Read online

Cross-modal representation learning learns a shared embedding between two or more modalities to improve performance in a given task compared to using only one of the modalities. Cross-modal representation learning from different data types - such as images and time-series data (e.g., audio or text data) – requires a deep metric learning loss that minimizes the distance between the modality embeddings. In this paper, we propose to use the contrastive or triplet loss, which uses positive and negative identities to create sample pairs with different labels, for cross-modal representation learning between image and time-series modalities (CMR-IS). By adapting the triplet loss for cross-modal representation learning, higher accuracy in the main (time-series classification) task can be achieved by exploiting additional information of the auxiliary (image classification) task. We present a triplet loss with a dynamic margin for single label and sequence-to-sequence classification tasks. We perform extensive evaluations on synthetic image and time-series data, and on data for offline handwriting recognition (HWR) and on online HWR from sensor-enhanced pens for classifying written words. Our experiments show an improved classification accuracy, faster convergence, and better generalizability due to an improved cross-modal representation. Furthermore, the more suitable generalizability leads to a better adaptability between writers for online HWR.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords