BMC Medical Research Methodology (May 2010)
Surname lists to identify South Asian and Chinese ethnicity from secondary data in Ontario, Canada: a validation study
Abstract
Abstract Background Surname lists are useful for identifying cohorts of ethnic minority patients from secondary data sources. This study sought to develop and validate lists to identify people of South Asian and Chinese origin. Methods Comprehensive lists of South Asian and Chinese surnames were reviewed to identify those that uniquely belonged to the ethnic minority group. Surnames that were common in other populations, communities or ethnic groups were specifically excluded. These surname lists were applied to the Registered Persons Database, a registry of the health card numbers assigned to all residents of the Canadian province of Ontario, so that all residents were assigned to South Asian ethnicity, Chinese ethnicity or the General Population. Ethnic assignment was validated against self-identified ethnicity through linkage with responses to the Canadian Community Health Survey. Results The final surname lists included 9,950 South Asian surnames and 1,133 Chinese surnames. All 16,688,384 current and former residents of Ontario were assigned to South Asian ethnicity, Chinese ethnicity or the General Population based on their surnames. Among 69,859 respondents to the Canadian Community Health Survey, both lists performed extremely well when compared against self-identified ethnicity: positive predictive value was 89.3% for the South Asian list, and 91.9% for the Chinese list. Because surnames shared with other ethnic groups were deliberately excluded from the lists, sensitivity was lower (50.4% and 80.2%, respectively). Conclusions These surname lists can be used to identify cohorts of people with South Asian and Chinese origins from secondary data sources with a high degree of accuracy. These cohorts could then be used in epidemiologic and health service research studies of populations with South Asian and Chinese origins.