Machine Learning with Applications (Dec 2022)
Improving text classification with transformers and layer normalization
Abstract
More than 25,000 injuries and 25 fatalities occur each year due to unstable furniture tip-over incidents. Classifying these furniture tip-over incidents is an essential task for understanding incident patterns and building safer products. For example, this classification can help standards development organizations (SDOs) and policy makers discover hidden insights, which can be used to develop standards and regulations that help improve furniture and make homes safer. Since 2000, the U.S. Consumer Product Safety Commission (CPSC) has published data related to consumer product injuries. The amount of data has grown rapidly, and the process of manually reviewing and classifying individual incidents has correspondingly become very resource intensive. This paper proposes an improved method that employs a combination of natural language processing (NLP) techniques and machine learning (ML) algorithms to classify textual data. Machine learning models can help reduce time and effort by streamlining incident narrative classification for determining whether incidents are related to furniture tip-overs. Challenges often presented by real-world data sets (such as the CPSC data used in our experiment) include imbalanced target classes and narratives requiring domain knowledge, since the data sets contain abbreviations and jargon. Using out-of-the-box, default classification models such as bidirectional encoder representations from transformers (BERT) might not yield adequate results. Our proposed method adds layer normalization and dropout layers to a transformer-based language model, which achieves better classification results than using a transformer-based language alone with imbalanced classes. We carefully measure the impact of hidden layers in order to fine-tune the model.