Data in Brief (Jun 2021)
House building tips (HBT) corpus dataset as a resource to discover Malay architectural ingenuity and identity1
Abstract
House Building Tips is the title of a classic text containing historical information on early house construction in Malay communities. These tips were written by a scholar with knowledge of house construction through observation of the surrounding environment. In Malaysia, written sources or records of house construction are scarce and underexposed. As such, this research was conducted to guarantee the written legacy of the construction of Malay houses. The purpose of this paper is to introduce a statistical data source of house building tips that is laden with Malay ingenuity and identity. The wordlists generated from this study can become a source of reference for the field of Malay architecture. Accordingly, this study utilises the quantitative method by applying the Linguistic Corpus Statistical Approach; these data utilise specific corpus development procedures, beginning with text collection, scanning and cleaning processes, text annotation, and data storing in plain text. Next, the data analysis procedure utilises a corpus software, LancsBox, to generate specialised wordlists. The bubble graphs are developed based on these wordlists through the Tableau software, and illustrate the most used lexical items with the raw and relative frequency values. This facilitates searches for, and the reading of, architectural words and architectural word references. These data represent written sources that need to be preserved and become points of reference concerning Malay architectural ingenuity and identity.