Computers (Jan 2024)

Cloud-Based Infrastructure and DevOps for Energy Fault Detection in Smart Buildings

  • Kaleb Horvath,
  • Mohamed Riduan Abid,
  • Thomas Merino,
  • Ryan Zimmerman,
  • Yesem Peker,
  • Shamim Khan

DOI
https://doi.org/10.3390/computers13010023
Journal volume & issue
Vol. 13, no. 1
p. 23

Abstract

Read online

We have designed a real-world smart building energy fault detection (SBFD) system on a cloud-based Databricks workspace, a high-performance computing (HPC) environment for big-data-intensive applications powered by Apache Spark. By avoiding a Smart Building Diagnostics as a Service approach and keeping a tightly centralized design, the rapid development and deployment of the cloud-based SBFD system was achieved within one calendar year. Thanks to Databricks’ built-in scheduling interface, a continuous pipeline of real-time ingestion, integration, cleaning, and analytics workflows capable of energy consumption prediction and anomaly detection was implemented and deployed in the cloud. The system currently provides fault detection in the form of predictions and anomaly detection for 96 buildings on an active military installation. The system’s various jobs all converge within 14 min on average. It facilitates the seamless interaction between our workspace and a cloud data lake storage provided for secure and automated initial ingestion of raw data provided by a third party via the Secure File Transfer Protocol (SFTP) and BLOB (Binary Large Objects) file system secure protocol drivers. With a powerful Python binding to the Apache Spark distributed computing framework, PySpark, these actions were coded into collaborative notebooks and chained into the aforementioned pipeline. The pipeline was successfully managed and configured throughout the lifetime of the project and is continuing to meet our needs in deployment. In this paper, we outline the general architecture and how it differs from previous smart building diagnostics initiatives, present details surrounding the underlying technology stack of our data pipeline, and enumerate some of the necessary configuration steps required to maintain and develop this big data analytics application in the cloud.

Keywords