Journal of Research in Education Sciences (Mar 2012)
缺失資料在因素分析上的處理方法之研究 Missing Data Techniques for Factor Analysis
Abstract
因素分析常用來研究問卷及量表。當資料缺失過多或缺失機制為非完全隨機時,分析所得的共同因素個數或因素負荷常有偏差。本研究使用「台灣教育長期追蹤資料庫」,將其中的完整資料視為基準資料,並根據原有缺失結構,建構一至五倍缺失比率的資料集,以探討因素分析對缺失插補的敏感度。研究者比較了四種缺失處理法,包括:可用個體法、完整個體法、邏輯斯迴歸插補法與蒙第卡羅-馬可夫鏈(Monte Carlo Markov Chain, MCMC)插補法。結果顯示,缺失比率愈高時,所估計出來的變異數矩陣與基準資料的矩陣差異愈大。可用個體法在缺失比率較高時,萃取的共同因子的個數比基準資料多。在因素負荷上,可用個體法的誤差最嚴重,而完整個體法雖然和其他兩種插補法的誤差接近,不過會因缺失比率的增加與基準的誤差而隨之變大。研究者建議在缺失比率20%~30%或以上時,使用邏輯斯迴歸插補法或是蒙第卡羅-馬可夫鏈插補法後再進行因素分析會有較小的誤差。 Factor analysis is frequently employed to analyze scales and questionnaires. However, when the proportion of missing data is high or the missing data are not random, the number of factors extracted can be biased. We used the Taiwan Education Panel Survey (TEPS) and constructed 5 data sets with different missing proportions to assess the effects of missingness on factor analysis imputation. Complete observed data were used as a baseline for comparison. We compared the 4 treatments: available case method (AC), the complete case method (CC), MCMC single imputation (MCMC), and step-wise logistic regression single imputation (LR). The results show that the higher the missing proportion, the greater the discrepancy between the covariance matrix of the constructed data set and that of the baseline. For the AC method, the higher the proportion of missing data, the more the number of extracted factors exceeds that of the baseline. The AC method possessed the largest bias in factor loadings. The bias in factor loading of the CC method increased as the missing portion also increased. Thus, we recommend not applying the list-wise deletion method for factor analysis when the missing proportion is 20% or more.
Keywords