Scientific Data (Aug 2024)
Quantum Chemistry Dataset with Ground- and Excited-state Properties of 450 Kilo Molecules
Abstract
Abstract Due to rapid advancements in deep learning techniques, the demand for large-volume high-quality datasets grows significantly in chemical research. We developed a quantum-chemistry database that includes 443,106 small organic molecules with sizes up to 10 heavy atoms including C, N, O, and F. Ground-state geometry optimizations and frequency calculations of all compounds were performed at the B3LYP/6-31G* level with the BJD3 dispersion correction, while the excited-state single-point calculations were conducted at the ωB97X-D/6-31G* level. Totally twenty-seven molecular properties, such as geometric, thermodynamic, electronic and energetic properties, were gathered from these calculations. Meanwhile, we also established a comprehensive protocol for the construction of a high-volume quantum-chemistry dataset. Our QCDGE (Quantum Chemistry Dataset with Ground- and Excited-State Properties) dataset contains a substantial volume of data, exhibits high chemical diversity, and most importantly includes excited-state information. This dataset, along with its construction protocol, is expected to have a significant impact on the broad applications of machine learning studies across different fields of chemistry, especially in the area of excited-state research.