Forgetful Forests: Data Structures for Machine Learning on Streaming Data under Concept Drift

Zhehu Yuan; Yinqi Sun; Dennis Shasha

doi:10.3390/a16060278

Algorithms (May 2023)

Forgetful Forests: Data Structures for Machine Learning on Streaming Data under Concept Drift

Zhehu Yuan,
Yinqi Sun,
Dennis Shasha

Affiliations

Zhehu Yuan: Courant Institute of Mathematical Science, New York University, New York, NY 10012, USA
Yinqi Sun: Courant Institute of Mathematical Science, New York University, New York, NY 10012, USA
Dennis Shasha: Courant Institute of Mathematical Science, New York University, New York, NY 10012, USA

DOI: https://doi.org/10.3390/a16060278
Journal volume & issue: Vol. 16, no. 6
p. 278

Abstract

Read online

Database and data structure research can improve machine learning performance in many ways. One way is to design better algorithms on data structures. This paper combines the use of incremental computation as well as sequential and probabilistic filtering to enable “forgetful” tree-based learning algorithms to cope with streaming data that suffers from concept drift. (Concept drift occurs when the functional mapping from input to classification changes over time). The forgetful algorithms described in this paper achieve high performance while maintaining high quality predictions on streaming data. Specifically, the algorithms are up to 24 times faster than state-of-the-art incremental algorithms with, at most, a 2% loss of accuracy, or are at least twice faster without any loss of accuracy. This makes such structures suitable for high volume streaming applications.

Published in Algorithms

ISSN: 1999-4893 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.mdpi.com/journal/algorithms

About the journal

Abstract

Keywords