Applied Sciences (Sep 2021)
Revisiting Text Guide, a Truncation Method for Long Text Classification
Abstract
The quality of text classification has greatly improved with the introduction of deep learning, and more recently, models using attention mechanism. However, to address the problem of classifying text instances that are longer than the length limit adopted by most of the best performing transformer models, the most common method is to naively truncate the text so that it meets the model limit. Researchers have proposed other approaches, but they do not appear to be popular, because of their high computational cost and implementation complexity. Recently, another method called Text Guide has been proposed, which allows for text truncation that outperforms the naive approach and simultaneously is less complex and costly than earlier proposed solutions. Our study revisits Text Guide by testing the influence of certain modifications on the method’s performance. We found that some aspects of the method can be altered to further improve performance and confirmed several assumptions regarding the dependence of the method’s quality on certain factors.
Keywords