Scientific Data (Jan 2024)

A database of thermally activated delayed fluorescent molecules auto-generated from scientific literature with ChemDataExtractor

  • Dingyun Huang,
  • Jacqueline M. Cole

DOI
https://doi.org/10.1038/s41597-023-02897-3
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 9

Abstract

Read online

Abstract A database of thermally activated delayed fluorescent (TADF) molecules was automatically generated from the scientific literature. It consists of 25,482 data records with an overall precision of 82%. Among these, 5,349 records have chemical names in the form of SMILES strings which are represented with 91% accuracy; these are grouped in a subsidiary database. Each data record contains one of the following four properties: maximum emission wavelength (λ EM), photoluminescence quantum yield (PLQY), singlet-triplet energy splitting (ΔE ST), and delayed lifetime (τ D). The databases were created through text mining using ChemDataExtractor, a chemistry-aware natural-language-processing toolkit, which has been adapted for TADF research. The text-mined corpus consisted of 2,733 papers from the Royal Society of Chemistry and Elsevier. To the best of our knowledge, these databases are the first databases that have been auto-generated for TADF molecules from existing publications. The databases have been publicly released for experimental and computational applications in the TADF research field.