An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade

Roberta Rodrigues de Lima; Anita M. R. Fernandes; James Roberto Bombasar; Bruno Alves da Silva; Paul Crocker; Valderi Reis Quietinho Leithardt

doi:10.3390/bdcc6010008

Big Data and Cognitive Computing (Jan 2022)

An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade

Roberta Rodrigues de Lima,
Anita M. R. Fernandes,
James Roberto Bombasar,
Bruno Alves da Silva,
Paul Crocker,
Valderi Reis Quietinho Leithardt

Affiliations

Roberta Rodrigues de Lima: Laboratory of Applied Intelligence, School of the Sea Science and Technology-University of Vale do Itajaí, Itajaí 88302-901, Brazil
Anita M. R. Fernandes: Laboratory of Applied Intelligence, School of the Sea Science and Technology-University of Vale do Itajaí, Itajaí 88302-901, Brazil
James Roberto Bombasar: Analysis and Systems Development Course, Centro Universitário Avantis, Balneário Camboriú 88339-125, Brazil
Bruno Alves da Silva: Laboratory of Applied Intelligence, School of the Sea Science and Technology-University of Vale do Itajaí, Itajaí 88302-901, Brazil
Paul Crocker: Instituto de Telecomunicações and Departamento de Informática, Universidade da Beira Interior, 6201-001 Covilhã, Portugal
Valderi Reis Quietinho Leithardt: COPELABS, Lusófona University of Humanities and Technologies, Campo Grande 376, 1749-024 Lisboa, Portugal

DOI: https://doi.org/10.3390/bdcc6010008
Journal volume & issue: Vol. 6, no. 1
p. 8

Abstract

Read online

Classification problems are common activities in many different domains and supervised learning algorithms have shown great promise in these areas. The classification of goods in international trade in Brazil represents a real challenge due to the complexity involved in assigning the correct category codes to a good, especially considering the tax penalties and legal implications of a misclassification. This work focuses on the training process of a classifier based on bidirectional encoder representations from transformers (BERT) for tax classification of goods with MCN codes which are the official classification system for import and export products in Brazil. In particular, this article presents results from using a specific Portuguese-language-pretrained BERT model, as well as results from using a multilingual-pretrained BERT model. Experimental results show that Portuguese model had a slightly better performance than the multilingual model, achieving an MCC 0.8491, and confirms that the classifiers could be used to improve specialists’ performance in the classification of goods.

Published in Big Data and Cognitive Computing

ISSN: 2504-2289 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology
Website: http://www.mdpi.com/journal/BDCC

About the journal

Abstract

Keywords