Energies (Nov 2022)

Semantic-Similarity-Based Schema Matching for Management of Building Energy Data

  • Zhiyu Pan,
  • Guanchen Pan,
  • Antonello Monti

DOI
https://doi.org/10.3390/en15238894
Journal volume & issue
Vol. 15, no. 23
p. 8894

Abstract

Read online

The increase in heterogeneous data in the building energy domain creates a difficult challenge for data integration. Schema matching, which maps the raw data from the building energy domain to a generic data model, is the necessary step in data integration and provides a unique representation. Only a small amount of labeled data for schema matching exists and it is time-consuming and labor-intensive to manually label data. This paper applies semantic-similarity methods to the automatic schema-mapping process by combining knowledge from natural language processing, which reduces the manual effort in heterogeneous data integration. The active-learning method is applied to solve the lack-of-labeled-data problem in schema matching. The results of the schema matching with building-energy-domain data show the pre-trained language model provides a massive improvement in the accuracy of schema matching and the active-learning method greatly reduces the amount of labeled data required.

Keywords