Patterns (Jan 2021)

SIMON: Open-Source Knowledge Discovery Platform

  • Adriana Tomic,
  • Ivan Tomic,
  • Levi Waldron,
  • Ludwig Geistlinger,
  • Max Kuhn,
  • Rachel L. Spreng,
  • Lindsay C. Dahora,
  • Kelly E. Seaton,
  • Georgia Tomaras,
  • Jennifer Hill,
  • Niharika A. Duggal,
  • Ross D. Pollock,
  • Norman R. Lazarus,
  • Stephen D.R. Harridge,
  • Janet M. Lord,
  • Purvesh Khatri,
  • Andrew J. Pollard,
  • Mark M. Davis

Journal volume & issue
Vol. 2, no. 1
p. 100178

Abstract

Read online

Summary: Data analysis and knowledge discovery has become more and more important in biology and medicine with the increasing complexity of biological datasets, but the necessarily sophisticated programming skills and in-depth understanding of algorithms needed pose barriers to most biologists and clinicians to perform such research. We have developed a modular open-source software, SIMON, to facilitate the application of 180+ state-of-the-art machine-learning algorithms to high-dimensional biomedical data. With an easy-to-use graphical user interface, standardized pipelines, and automated approach for machine learning and other statistical analysis methods, SIMON helps to identify optimal algorithms and provides a resource that empowers non-technical and technical researchers to identify crucial patterns in biomedical data. The Bigger Picture: Over the past years, technological advances have enabled the generation of large amounts of data at multiple scales. The integration of high-dimensional data is particularly important in biomedical sciences, as they can be used to identify biological mechanisms and predict clinical outcomes well in advance of their occurrence. Because of the lack of powerful analytical tools that can be used by the average biomedical researcher, translation of such knowledge has been extremely slow. We have developed an open-source software, SIMON, to facilitate the application of machine learning to high-dimensional biomedical data. In SIMON, analysis is performed using an intuitive graphical user interface and standardized, automated machine learning approach allowing non-technical researchers to identify patterns and extract knowledge from high-dimensional data and build high-quality predictive models.

Keywords