Advances in Applied Energy (Aug 2021)
Advancing smart building readiness: Automated metadata extraction using neural language processing methods
Abstract
Digitalisation of the built environment provides multiple benefits such as operational and energy productivity improvements and supports the participation of buildings in the management of electricity networks. Automated methods to infer contextual information from building management systems and Internet of Things sensor metadata plays a significant role in this process. In this paper, we have studied the problem of transfer learning using text metadata to automatically tag building sensors with semantic tags. We demonstrate that state-of-the-art pre-trained neural language models are a promising approach which to the best of our knowledge have not been studied due to the lack of pre-processors to tokenise the texts. We develop a tokeniser based on the unigram language model capable of tokenising the idiosyncratic text found in building sensor metadata and use it to train from scratch a transformer based language model using sensor metadata from 152 buildings. The weights are then used to train a tagset classifier using transfer learning, and tested on 30 buildings. Metrics such as precision, recall and the Jaccard similarity coefficient have been used to evaluate the suitability of our results for various buildings. The proposed method can predict building tagsets with over 70% accuracy against a real world noisy dataset.