Statistica (May 2013)
Decomposition of variance in terms of conditional means
Abstract
Two different sets of data are used to test an apparently new approach to the analysis of the variance of a numerical variable which depends on qualitative variables. We suggest that this approach be used to complement other existing techniques to study the interdependence of the variables involved. According to our method, the variance is expressed as a sum of orthogonal components, obtained as differences of conditional means, with respect to the qualitative characters. The resulting expression for the variance depends on the ordering in which the characters are considered. We suggest an algorithm which leads to an ordering which is deemed natural. The first set of data concerns the score achieved by a population of students on an entrance examination based on a multiple choice test with 30 questions. In this case the qualitative characters are dyadic and correspond to correct or incorrect answer to each question. The second set of data concerns the delay to obtain the degree for a population of graduates of Italian universities. The variance in this case is analyzed with respect to a set of seven specific qualitative characters of the population studied (gender, previous education, working condition, parent's educational level, field of study, etc.).