Cloud-Based Infrastructure and DevOps for Energy Fault Detection in Smart Buildings

Kaleb Horvath; Mohamed Riduan Abid; Thomas Merino; Ryan Zimmerman; Yesem Peker; Shamim Khan

doi:10.3390/computers13010023

Computers (Jan 2024)

Cloud-Based Infrastructure and DevOps for Energy Fault Detection in Smart Buildings

Kaleb Horvath,
Mohamed Riduan Abid,
Thomas Merino,
Ryan Zimmerman,
Yesem Peker,
Shamim Khan

Affiliations

Kaleb Horvath: TSYS School of Computer Science, Turner College of Business, Columbus State University, Columbus, GA 31907, USA
Mohamed Riduan Abid: TSYS School of Computer Science, Turner College of Business, Columbus State University, Columbus, GA 31907, USA
Thomas Merino: TSYS School of Computer Science, Turner College of Business, Columbus State University, Columbus, GA 31907, USA
Ryan Zimmerman: TSYS School of Computer Science, Turner College of Business, Columbus State University, Columbus, GA 31907, USA
Yesem Peker: TSYS School of Computer Science, Turner College of Business, Columbus State University, Columbus, GA 31907, USA
Shamim Khan: TSYS School of Computer Science, Turner College of Business, Columbus State University, Columbus, GA 31907, USA

DOI: https://doi.org/10.3390/computers13010023
Journal volume & issue: Vol. 13, no. 1
p. 23

Abstract

Read online

We have designed a real-world smart building energy fault detection (SBFD) system on a cloud-based Databricks workspace, a high-performance computing (HPC) environment for big-data-intensive applications powered by Apache Spark. By avoiding a Smart Building Diagnostics as a Service approach and keeping a tightly centralized design, the rapid development and deployment of the cloud-based SBFD system was achieved within one calendar year. Thanks to Databricks’ built-in scheduling interface, a continuous pipeline of real-time ingestion, integration, cleaning, and analytics workflows capable of energy consumption prediction and anomaly detection was implemented and deployed in the cloud. The system currently provides fault detection in the form of predictions and anomaly detection for 96 buildings on an active military installation. The system’s various jobs all converge within 14 min on average. It facilitates the seamless interaction between our workspace and a cloud data lake storage provided for secure and automated initial ingestion of raw data provided by a third party via the Secure File Transfer Protocol (SFTP) and BLOB (Binary Large Objects) file system secure protocol drivers. With a powerful Python binding to the Apache Spark distributed computing framework, PySpark, these actions were coded into collaborative notebooks and chained into the aforementioned pipeline. The pipeline was successfully managed and configured throughout the lifetime of the project and is continuing to meet our needs in deployment. In this paper, we outline the general architecture and how it differs from previous smart building diagnostics initiatives, present details surrounding the underlying technology stack of our data pipeline, and enumerate some of the necessary configuration steps required to maintain and develop this big data analytics application in the cloud.

Published in Computers

ISSN: 2073-431X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.mdpi.com/journal/computers

About the journal

Abstract

Keywords