PLoS ONE (Jan 2014)
Predicting cannabis abuse screening test (CAST) scores: a recursive partitioning analysis using survey data from Czech Republic, Italy, the Netherlands and Sweden.
Abstract
IntroductionCannabis is Europe's most commonly used illicit drug. Some users do not develop dependence or other problems, whereas others do. Many factors are associated with the occurrence of cannabis-related disorders. This makes it difficult to identify key risk factors and markers to profile at-risk cannabis users using traditional hypothesis-driven approaches. Therefore, the use of a data-mining technique called binary recursive partitioning is demonstrated in this study by creating a classification tree to profile at-risk users.Methods59 variables on cannabis use and drug market experiences were extracted from an internet-based survey dataset collected in four European countries (Czech Republic, Italy, Netherlands and Sweden), n = 2617. These 59 potential predictors of problematic cannabis use were used to partition individual respondents into subgroups with low and high risk of having a cannabis use disorder, based on their responses on the Cannabis Abuse Screening Test. Both a generic model for the four countries combined and four country-specific models were constructed.ResultsOf the 59 variables included in the first analysis step, only three variables were required to construct a generic partitioning model to classify high risk cannabis users with 65-73% accuracy. Based on the generic model for the four countries combined, the highest risk for cannabis use disorder is seen in participants reporting a cannabis use on more than 200 days in the last 12 months. In comparison to the generic model, the country-specific models led to modest, non-significant improvements in classification accuracy, with an exception for Italy (p = 0.01).ConclusionUsing recursive partitioning, it is feasible to construct classification trees based on only a few variables with acceptable performance to classify cannabis users into groups with low or high risk of meeting criteria for cannabis use disorder. The number of cannabis use days in the last 12 months is the most relevant variable. The identified variables may be considered for use in future screeners for cannabis use disorders.