BMC Medical Research Methodology (Nov 2010)
Categorisation of continuous exposure variables revisited. A response to the Hyperglycaemia and Adverse Pregnancy Outcome (HAPO) Study
Abstract
Abstract Background Although the general statistical advice is to keep continuous exposure variables as continuous in statistical analyses, categorisation is still a common approach in medical research. In a recent paper from the Hyperglycaemia and Adverse Pregnancy Outcome (HAPO) Study, categorisation of body mass index (BMI) was used when analysing the effect of BMI on adverse pregnancy outcomes. The lowest category, labelled "underweight", was used as the reference category. Methods The present paper gives a summary of reasons for categorisation and methodological drawbacks of this approach. We also discuss the choice of reference category and alternative analyses. We exemplify our arguments by a reanalysis of results from the HAPO paper. Results Categorisation of continuous exposure data results in loss of power and other methodological challenges. An unfortunate choice of reference category can give additional lack of precision and obscure the interpretation of risk estimates. A highlighted odds ratio (OR) in the HAPO study is the OR for birth weight >90th percentile for women in the highest compared to the lowest BMI category ("obese class III" versus "underweight"). This estimate was OR = 4.55 and OR = 3.52, with two different multiple logistic regression models. When using the "normal weight" category as the reference, our corresponding estimates were OR = 2.03 and OR = 1.62, respectively. Moreover, our choice of reference category also gave narrower confidence intervals. Summary Due to several methodological drawbacks, categorisation should be avoided. Modern statistical analyses should be used to analyse continuous exposure data, and to explore non-linear relations. If continuous data are categorised, special attention must be given to the choice of reference category.