IEEE Access (Jan 2024)
Needleman-Wunsch Attention: A Framework for Enhancing DNA Sequence Embedding
Abstract
In many biological research studies that rely on DNA sequence data, calculating the edit distance between two sequences is a vital component. However, computing the edit distance involves dynamic programming, which can be computationally intensive. To address this challenge, numerous works have focused on embedding sequences into the vector space while preserving the distance metric. This means that the edit distance between sequences is analogous to the distance between their corresponding vectors. In this study, we propose a novel Needleman-Wunsch Attention (NWA) framework for sequence embedding that leverages the relationship between the Needleman-Wunsch (NW) matrix and attention maps to improve the accuracy and efficiency of edit distance approximation methods. Our approach applies to any deep learning-based sequence embedding network and provides a general solution to improve the accuracy and efficiency of edit distance approximation methods. We validate the effectiveness of our proposed method by applying it to various existing embedding networks, demonstrating improved edit distance-preserving embedding in an actual dataset. The code is publicly available at https://github.com/thisislim/nw-attention/.
Keywords