Is bigger always better? A controversial journey to the center of machine learning design, with uses and misuses of big data for predicting water meter failures

Marco Roccetti; Giovanni Delnevo; Luca Casini; Giuseppe Cappiello

doi:10.1186/s40537-019-0235-y

Journal of Big Data (Aug 2019)

Is bigger always better? A controversial journey to the center of machine learning design, with uses and misuses of big data for predicting water meter failures

Marco Roccetti,
Giovanni Delnevo,
Luca Casini,
Giuseppe Cappiello

Affiliations

Marco Roccetti: Department of Computer Science and Engineering, University of Bologna
Giovanni Delnevo: Department of Computer Science and Engineering, University of Bologna
Luca Casini: Department of Computer Science and Engineering, University of Bologna
Giuseppe Cappiello: Department of Management, University of Bologna

DOI: https://doi.org/10.1186/s40537-019-0235-y
Journal volume & issue: Vol. 6, no. 1
pp. 1 – 23

Abstract

Read online

Abstract In this paper, we describe the design of a machine learning-based classifier, tailored to predict whether a water meter will fail or need a replacement. Our initial attempt to train a recurrent deep neural network (RNN), based on the use of 15 million of readings gathered from 1 million of mechanical water meters, spread throughout Northern Italy, led to non-positive results. We learned this was due to a lack of specific attention devoted to the quality of the analyzed data. We, hence, developed a novel methodology, based on a new semantics which we enforced on the training data. This allowed us to extract only those samples which are representative of the complex phenomenon of defective water meters. Adopting such a methodology, the accuracy of our RNN exceeded the 80% threshold. We simultaneously realized that the new training dataset differed significantly, in statistical terms, from the initial dataset, leading to an apparent paradox. Thus, with our contribution, we have demonstrated how to reconcile such a paradox, showing that our classifier can help detecting defective meters, while simplifying replacement procedures.

Published in Journal of Big Data

ISSN: 2196-1115 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journalofbigdata.springeropen.com

About the journal

Abstract

Keywords