Clinical Epidemiology and Global Health (May 2022)
Preferring Box-Cox transformation, instead of log transformation to convert skewed distribution of outcomes to normal in medical research
Abstract
Background: While dealing with skewed outcome, researchers often use log-transformation to convert the data into normal and apply commonly used statistical tests like t-test, linear regression, etc. However, the log-transformed data will not be normal at all times. In such situations, Box-Cox transformation (BCT) can be used to transform skewed data into normal. However, the problem arises when researcher wanted to predict the outcome in original scale. Therefore the aim of this paper is to demonstrated the use of BCT for a skewed outcome and predict the outcome in original scale, using regression method. Materials and method: The Cost of Pyelonephritis in Type-2 Diabetes (COPID) study data was used to demonstrate the BCT and back transformation method. This study conducted among patients admitted in the general medical wards in a tertiary care hospital from south India. The BCT was applied for total cost to convert it into normal. The multiple linear regression method was used and the predicted values were back transformed into original scale. Results: The estimated lambda was −0.36. After BCT, total cost was approximately normal (p-value = 0.621). The residual plots suggested that the error follows normal and the variance is constant. The median (IQR) of the observed total cost was 57694(42405, 98621) whereas predicted total cost was 58317(44270, 95375). Conclusion: When the data is skewed, the log-transformation is not appropriate in all scenarios. However, BCT will ensure normal distribution after transformation and also we can back transform the outcome in original scale given the covariates.