Publications (Nov 2023)

Migrating 120,000 Legacy Publications from Several Systems into a Current Research Information System Using Advanced Data Wrangling Techniques

  • Yrjö Lappalainen,
  • Matti Lassila,
  • Tanja Heikkilä,
  • Jani Nieminen,
  • Tapani Lehtilä

DOI
https://doi.org/10.3390/publications11040049
Journal volume & issue
Vol. 11, no. 4
p. 49

Abstract

Read online

This article describes a complex CRIS (current research information system) implementation project involving the migration of around 120,000 legacy publication records from three different systems. The project, undertaken by Tampere University, encountered several challenges in data diversity, data quality, and resource allocation. To handle the extensive and heterogenous dataset, innovative approaches such as machine learning techniques and various data wrangling tools were used to process data, correct errors, and merge information from different sources. Despite significant delays and unforeseen obstacles, the project was ultimately successful in achieving its goals. The project served as a valuable learning experience, highlighting the importance of data quality and standardized practices, and the need for dedicated resources in handling complex data migration projects in research organizations. This study stands out for its comprehensive documentation of the data wrangling and migration process, which has been less explored in the context of CRIS literature.

Keywords