IEEE Access (Jan 2018)

TUMK-ELM: A Fast Unsupervised Heterogeneous Data Learning Approach

  • Lingyun Xiang,
  • Guohan Zhao,
  • Qian Li,
  • Wei Hao,
  • Feng Li

DOI
https://doi.org/10.1109/ACCESS.2018.2847037
Journal volume & issue
Vol. 6
pp. 35305 – 35315

Abstract

Read online

Advanced unsupervised learning techniques are an emerging challenge in the big data era due to the increasing requirements of extracting knowledge from a large amount of unlabeled heterogeneous data. Recently, many efforts of unsupervised learning have been done to effectively capture information from heterogeneous data. However, most of them are with huge time consumption, which obstructs their further application in the big data analytics scenarios, where an enormous amount of heterogeneous data are provided but real-time learning are strongly demanded. In this paper, we address this problem by proposing a fast unsupervised heterogeneous data learning algorithm, namely two-stage unsupervised multiple kernel extreme learning machine (TUMK-ELM). TUMK-ELM alternatively extracts information from multiple sources and learns the heterogeneous data representation with closed-form solutions, which enables its extremely fast speed. As justified by theoretical evidence, TUMK-ELM has low computational complexity at each stage, and the iteration of its two stages can be converged within finite steps. As experimentally demonstrated on 13 real-life data sets, TUMK-ELM gains a large efficiency improvement compared with three state-of-the-art unsupervised heterogeneous data learning methods (up to 140 000 times) while it achieves a comparable performance in terms of effectiveness.

Keywords