大数据 (Jul 2024)
Research on key technologies for efficient storage and access of turbulent big data
Abstract
With the development of measurement techniques and numerical simulation technologies, data-driven turbulence research has become a new approach in this field.In China, several wind tunnel laboratories and supercomputing centers have been established for turbulence simulations, resulting in a substantial collection of turbulence data.However, there is currently no centralized turbulence data management platform in China, which makes it difficult to achieve the exchange and share of the expensive experimental and simulation data.Turbulence data is characterized by its large volume, high dimensionality, precision and heterogeneity, which present problems in terms of storage, access and management efficiency.A turbulence big data distributed storage system called TDFS was designed, specifically targeting typical flow problems in aviation, aerospace, and marine applications.Considering the access characteristics of turbulence big data, the novel metadata management methods and data access interfaces were designed in TDFS.Experimental results demonstrate that TDFS achieves interface response speed improvements of 54.38% and 57.7% compared with HDFS and GlusterFS, respectively.Additionally, to reduce the storage overhead of turbulence big data, a lazy replication compression mechanism based on HDF5 was designed, resulting in 34% reduction in storage space, compared to the original replication storage approach.