An infrastructure for precision medicine through analysis of big data

Marco Moscatelli; Andrea Manconi; Mauro Pessina; Giovanni Fellegara; Stefano Rampoldi; Luciano Milanesi; Andrea Casasco; Matteo Gnocchi

doi:10.1186/s12859-018-2300-5

BMC Bioinformatics (Oct 2018)

An infrastructure for precision medicine through analysis of big data

Marco Moscatelli,
Andrea Manconi,
Mauro Pessina,
Giovanni Fellegara,
Stefano Rampoldi,
Luciano Milanesi,
Andrea Casasco,
Matteo Gnocchi

Affiliations

Marco Moscatelli: Institute for Biomedical Technologies – National Research Council (CNR-ITB)
Andrea Manconi: Institute for Biomedical Technologies – National Research Council (CNR-ITB)
Mauro Pessina: Centro Diagnostico Italiano
Giovanni Fellegara: Centro Diagnostico Italiano
Stefano Rampoldi: Centro Diagnostico Italiano
Luciano Milanesi: Institute for Biomedical Technologies – National Research Council (CNR-ITB)
Andrea Casasco: Centro Diagnostico Italiano
Matteo Gnocchi: Institute for Biomedical Technologies – National Research Council (CNR-ITB)

DOI: https://doi.org/10.1186/s12859-018-2300-5
Journal volume & issue: Vol. 19, no. S10
pp. 51 – 61

Abstract

Read online

Abstract Background Nowadays, the increasing availability of omics data, due to both the advancements in the acquisition of molecular biology results and in systems biology simulation technologies, provides the bases for precision medicine. Success in precision medicine depends on the access to healthcare and biomedical data. To this end, the digitization of all clinical exams and medical records is becoming a standard in hospitals. The digitization is essential to collect, share, and aggregate large volumes of heterogeneous data to support the discovery of hidden patterns with the aim to define predictive models for biomedical purposes. Patients’ data sharing is a critical process. In fact, it raises ethical, social, legal, and technological issues that must be properly addressed. Results In this work, we present an infrastructure devised to deal with the integration of large volumes of heterogeneous biological data. The infrastructure was applied to the data collected between 2010–2016 in one of the major diagnostic analysis laboratories in Italy. Data from three different platforms were collected (i.e., laboratory exams, pathological anatomy exams, biopsy exams). The infrastructure has been designed to allow the extraction and aggregation of both unstructured and semi-structured data. Data are properly treated to ensure data security and privacy. Specialized algorithms have also been implemented to process the aggregated information with the aim to obtain a precise historical analysis of the clinical activities of one or more patients. Moreover, three Bayesian classifiers have been developed to analyze examinations reported as free text. Experimental results show that the classifiers exhibit a good accuracy when used to analyze sentences related to the sample location, diseases presence and status of the illnesses. Conclusions The infrastructure allows the integration of multiple and heterogeneous sources of anonymized data from the different clinical platforms. Both unstructured and semi-structured data are processed to obtain a precise historical analysis of the clinical activities of one or more patients. Data aggregation allows to perform a series of statistical assessments required to answer complex questions that can be used in a variety of fields, such as predictive and precision medicine. In particular, studying the clinical history of patients that have developed similar pathologies can help to predict or individuate markers able to allow an early diagnosis of possible illnesses.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords