BMC Bioinformatics (Oct 2018)

Integrating omics datasets with the OmicsPLS package

  • Said el Bouhaddani,
  • Hae-Won Uh,
  • Geurt Jongbloed,
  • Caroline Hayward,
  • Lucija Klarić,
  • Szymon M. Kiełbasa,
  • Jeanine Houwing-Duistermaat

DOI
https://doi.org/10.1186/s12859-018-2371-3
Journal volume & issue
Vol. 19, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Background With the exponential growth in available biomedical data, there is a need for data integration methods that can extract information about relationships between the data sets. However, these data sets might have very different characteristics. For interpretable results, data-specific variation needs to be quantified. For this task, Two-way Orthogonal Partial Least Squares (O2PLS) has been proposed. To facilitate application and development of the methodology, free and open-source software is required. However, this is not the case with O2PLS. Results We introduce OmicsPLS, an open-source implementation of the O2PLS method in R. It can handle both low- and high-dimensional datasets efficiently. Generic methods for inspecting and visualizing results are implemented. Both a standard and faster alternative cross-validation methods are available to determine the number of components. A simulation study shows good performance of OmicsPLS compared to alternatives, in terms of accuracy and CPU runtime. We demonstrate OmicsPLS by integrating genetic and glycomic data. Conclusions We propose the OmicsPLS R package: a free and open-source implementation of O2PLS for statistical data integration. OmicsPLS is available at https://cran.r-project.org/package=OmicsPLS and can be installed in R via install.packages(“OmicsPLS”).

Keywords