Frontiers in Education (Apr 2024)

Gender and race measurement invariance of the Strengths and Difficulties Questionnaire in a U.S. base sample

  • Emily Graybill,
  • Brian Barger,
  • Ashley Salmon,
  • Scott Lewis

DOI
https://doi.org/10.3389/feduc.2024.1310449
Journal volume & issue
Vol. 9

Abstract

Read online

IntroductionThe Strengths and Difficulties Questionnaire (SDQ) is one the most widely used behavior screening tools for public schools due to its strong psychometric properties, low cost, and brief (25-question) format. However, this screening tool has several limitations including being primarily developed for the purposes of identifying clinical diagnostic conditions and primarily in a European population. To date, there has been minimal comparative research on measurement invariance in relationship to important U.S. socio-demographic metrics such as race and gender.MethodThis study utilized both structural equation modeling (i.e., confirmatory factor analysis) and item response theory (IRT) methods to investigate the measurement invariance of the SDQ across gender (male, female) and race (Black, White). CFA analyses were first conducted for each of the SDQ subscales to identify potential misfit in loadings, thresholds, and residuals. IRT-graded response models were then conducted to identify and quantify the between-group differences at the item and factor levels in terms of Cohen's d styled metrics (d > 0.2 = small, d > 0.5 = medium, d > 8 = large).ResultsThere were 2,821 high school participants (52% Male, 48% Female; 88% Black, 12% White) included in these analyses. CFA analyses suggested that the item-factor relationship for most subscales were invariant, but the Conduct Problems and Hyperactivity subscales were non-invariant for strict measurement invariance. IRT analyses identified several invariant items ranging from small to large. Despite moderate to large effects for item scores on several scales, the test-level effects on scale scores were negligible.DiscussionThese analyses suggest that the SDQ subscale scores display reasonable comparable item-factor relationships across groups. Several subscale item scores displayed substantive item-level misfit, but the test level effects were minimal. Implications for the field are discussed.

Keywords