Scientific Data (Nov 2024)

A dataset of venture capitalist types in China (1978–2021): A machine-human hybrid approach

  • Jin Chen,
  • Ruining Cao,
  • Yifei Song,
  • Anan Hu,
  • Ying Ding

DOI
https://doi.org/10.1038/s41597-024-04108-z
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 18

Abstract

Read online

Abstract Despite escalating interest in distinguishing among various types of venture capitalists (VCs) and their roles in shaping entrepreneurship and innovation, such research remains sparse in the world’s second-largest VC market, i.e., China. To address this important gap, we have devised a machine-human hybrid approach to perform the classification task for VC types. Specifically, we have compiled a list of 49,187 VCs that made investments in China before 2021 from CVSource database, collected VC ownership information from other public sources, developed machine-learning algorithms to predict VC types, and used human coders when machine-learning failed to produce a prediction. Utilizing this hybrid approach, we have classified VCs into one of the following types: GVC (public agency-affiliated, state-owned enterprise-affiliated), CVC (corporate VC), IVC (independent VC), BVC (bank-affiliated VC), FVC (financial/non-bank-affiliated VC), UVC (university-affiliated VC), and PenVC (pension-fund-affiliated VC). We not only provide the most up-to-date database for VC types in the Chinese setting but also demonstrate how to leverage machine-learning algorithms to devise a transparent coding approach for VC-type classifications.