NER in Archival Finding Aids: Extended

Luís Filipe da Costa Cunha; José Carlos Ramalho

doi:10.3390/make4010003

Machine Learning and Knowledge Extraction (Jan 2022)

NER in Archival Finding Aids: Extended

Luís Filipe da Costa Cunha,
José Carlos Ramalho

Affiliations

Luís Filipe da Costa Cunha: Department of Informatics, University of Minho, 4710-057 Braga, Portugal
José Carlos Ramalho: Department of Informatics, University of Minho, 4710-057 Braga, Portugal

DOI: https://doi.org/10.3390/make4010003
Journal volume & issue: Vol. 4, no. 1
pp. 42 – 65

Abstract

Read online

The amount of information preserved in Portuguese archives has increased over the years. These documents represent a national heritage of high importance, as they portray the country’s history. Currently, most Portuguese archives have made their finding aids available to the public in digital format, however, these data do not have any annotation, so it is not always easy to analyze their content. In this work, Named Entity Recognition solutions were created that allow the identification and classification of several named entities from the archival finding aids. These named entities translate into crucial information about their context and, with high confidence results, they can be used for several purposes, for example, the creation of smart browsing tools by using entity linking and record linking techniques. In order to achieve high result scores, we annotated several corpora to train our own Machine Learning algorithms in this context domain. We also used different architectures, such as CNNs, LSTMs, and Maximum Entropy models. Finally, all the created datasets and ML models were made available to the public with a developed web platform, NER@DI.

Published in Machine Learning and Knowledge Extraction

ISSN: 2504-4990 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware
Website: https://www.mdpi.com/journal/make

About the journal

Abstract

Keywords