Scientific Data (Aug 2024)
A cross-institutional database of operational risk external loss events in Chinese banking sector 1986–2023
Abstract
Abstract Nowadays the collection of operational risk data worldwide highly relies on human labor, leading to slow updates, data inconsistency, and limited quantity. There remains a substantial shortage of publicly accessible operational risk databases for risk analysis. This study proposes a new data collection framework by aggregating text mining methods to replace the exhausting manual collection process. The news about operational risk can be automatically collected from the web page, then its content is analyzed and the key information is extracted. Finally, the Public-Chinese Operational Loss Data (P-COLD) database for financial institutions is constructed and expanded. Each record contains 12 key information, such as occurrence time, loss amount, and business lines, offering a more thorough description of operational risk events. With 3,723 data records from 1986 to 2023, the P-COLD database has become one of the largest and most comprehensive external operational risk databases in China. We anticipate the P-COLD database will contribute to advancements in operational risk capital calculations, dependence analysis, and institutional internal controls.