Health Technology Assessment (Mar 2024)
Patient-reported outcome measures for monitoring primary care patients with depression: the PROMDEP cluster RCT and economic evaluation
Abstract
Background Guidelines on the management of depression recommend that practitioners use patient-reported outcome measures for the follow-up monitoring of symptoms, but there is a lack of evidence of benefit in terms of patient outcomes. Objective To test using the Patient Health Questionnaire-9 questionnaire as a patient-reported outcome measure for monitoring depression, training practitioners in interpreting scores and giving patients feedback. Design Parallel-group, cluster-randomised superiority trial; 1 : 1 allocation to intervention and control. Setting UK primary care (141 group general practices in England and Wales). Inclusion criteria Patients aged ≥ 18 years with a new episode of depressive disorder or symptoms, recruited mainly through medical record searches, plus opportunistically in consultations. Exclusions Current depression treatment, dementia, psychosis, substance misuse and risk of suicide. Intervention Administration of the Patient Health Questionnaire-9 questionnaire with patient feedback soon after diagnosis, and at follow-up 10–35 days later, compared with usual care. Primary outcome Beck Depression Inventory, 2nd edition, symptom scores at 12 weeks. Secondary outcomes Beck Depression Inventory, 2nd edition, scores at 26 weeks; antidepressant drug treatment and mental health service contacts; social functioning (Work and Social Adjustment Scale) and quality of life (EuroQol 5-Dimension, five-level) at 12 and 26 weeks; service use over 26 weeks to calculate NHS costs; patient satisfaction at 26 weeks (Medical Informant Satisfaction Scale); and adverse events. Sample size The original target sample of 676 patients recruited was reduced to 554 due to finding a significant correlation between baseline and follow-up values for the primary outcome measure. Randomisation Remote computerised randomisation with minimisation by recruiting university, small/large practice and urban/rural location. Blinding Blinding of participants was impossible given the open cluster design, but self-report outcome measures prevented observer bias. Analysis was blind to allocation. Analysis Linear mixed models were used, adjusted for baseline depression, baseline anxiety, sociodemographic factors, and clustering including practice as random effect. Quality of life and costs were analysed over 26 weeks. Qualitative interviews Practitioner and patient interviews were conducted to reflect on trial processes and use of the Patient Health Questionnaire-9 using the Normalization Process Theory framework. Results Three hundred and two patients were recruited in intervention arm practices and 227 patients were recruited in control practices. Primary outcome data were collected for 252 (83.4%) and 195 (85.9%), respectively. No significant difference in Beck Depression Inventory, 2nd edition, score was found at 12 weeks (adjusted mean difference –0.46, 95% confidence interval –2.16 to 1.26). Nor were significant differences found in Beck Depression Inventory, 2nd Edition, score at 26 weeks, social functioning, patient satisfaction or adverse events. EuroQol-5 Dimensions, five-level version, quality-of-life scores favoured the intervention arm at 26 weeks (adjusted mean difference 0.053, 95% confidence interval 0.013 to 0.093). However, quality-adjusted life-years over 26 weeks were not significantly greater (difference 0.0013, 95% confidence interval –0.0157 to 0.0182). Costs were lower in the intervention arm but, again, not significantly (–£163, 95% confidence interval –£349 to £28). Cost-effectiveness and cost–utility analyses, therefore, suggested that the intervention was dominant over usual care, but with considerable uncertainty around the point estimates. Patients valued using the Patient Health Questionnaire-9 to compare scores at baseline and follow-up, whereas practitioner views were more mixed, with some considering it too time-consuming. Conclusions We found no evidence of improved depression management or outcome at 12 weeks from using the Patient Health Questionnaire-9, but patients’ quality of life was better at 26 weeks, perhaps because feedback of Patient Health Questionnaire-9 scores increased their awareness of improvement in their depression and reduced their anxiety. Further research in primary care should evaluate patient-reported outcome measures including anxiety symptoms, administered remotely, with algorithms delivering clear recommendations for changes in treatment. Study registration This study is registered as IRAS250225 and ISRCTN17299295. Funding This award was funded by the National Institute for Health and Care Research (NIHR) Health Technology Assessment programme (NIHR award ref: 17/42/02) and is published in full in Health Technology Assessment; Vol. 28, No. 17. See the NIHR Funding and Awards website for further award information. Plain language summary Depression is common, can be disabling and costs the nation billions. The National Health Service recommends general practitioners who treat people with depression use symptom questionnaires to help assess whether those people are getting better over time. A symptom questionnaire is one type of patient-reported outcome measure. Patient-reported outcome measures appear to benefit people having therapy and mental health care, but this approach has not been tested thoroughly in general practice. Most people with depression are treated in general practice, so it is important to test patient-reported outcome measures there, too. In this study, we tested whether using a patient-reported outcome measure helps people with depression get better more quickly. The study was a ‘randomised controlled trial’ in general practices, split into two groups. In one group, people with depression completed the Patient Health Questionnaire, or ‘PHQ-9’, patient-reported outcome measure, which measures nine symptoms of depression. In the other group, people with depression were treated as usual without the Patient Health Questionnaire-9. We fed the results of the Patient Health Questionnaire-9 back to the people with depression themselves to show them how severe their depression was and asked them to discuss the results with the practitioners looking after them. We found no differences between the patient-reported outcome measure group and the control group in their level of depression; their work or social life; their satisfaction with care from their practice; or their use of medicines, therapy or specialist care for depression. However, we did find that their quality of life was improved at 6 months, and the costs of the National Health Service services they used were lower. Using the Patient Health Questionnaire-9 can improve patients’ quality of life, perhaps by making them more aware of improvement in their depression symptoms, and less anxious as a result. Future research should test using a patient-reported outcome measure that includes anxiety and processing the answers through a computer to give practitioners clearer advice on possible changes to treatment for depression. Scientific summary Some text in this chapter has been adapted from the study protocol published as: Kendrick T, Moore M, Leydon G, et al. Patient-reported outcome measures for monitoring primary care patients with depression (PROMDEP): study protocol for a randomised controlled trial. Trials 2020;21:441. https://doi.org/10.1186/s13063-020-04344-9. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article unless otherwise stated. Background Depression is common and costly. It can lead to chronic disability, poor quality of life, suicide, and high service use and costs. National Institute for Health and Care Excellence guidelines recommend different treatments for more severe and less severe depression, but general practitioners, who treat more the majority of people with depression in primary care, are often inaccurate in their global clinical assessments of depression severity, and treatment is not targeted to patients most likely to benefit. The National Institute for Health and Care Excellence recommends that practitioners consider using validated patient-reported outcome measures to inform treatment at diagnosis and follow-up of people with depression, but there is insufficient evidence that these measures improve depression management and outcomes for patients in primary care. Aim and objectives The aim of the study was to answer the research question: What is the effectiveness and cost-effectiveness of assessing primary care patients with depression or low mood soon after diagnosis and again at follow-up 10–35 days later, using the Patient Health Questionnaire-9 combined with patient feedback and practitioner guidance on treatment? The objectives were to (1) carry out a cluster-randomised controlled trial to compare the intervention with usual care; (2) provide intervention arm patients with written feedback on their Patient Health Questionnaire-9 scores, indicating evidence-based treatments relevant to the level of severity of depression to discuss with practitioners; (3) train practitioners to interpret Patient Health Questionnaire-9 scores and their implications for choice of treatment, taking into account contextual factors; (4) follow up participants for 26 weeks, with research assessments at 12 and 26 weeks; (5) determine the primary outcome of depressive symptoms on the Beck Depression Inventory, 2nd edition, at 12-week follow-up; (6) examine secondary outcomes, including depressive symptoms on the Beck Depression Inventory, 2nd edition, at 26 weeks, and social functioning and quality of life at both 12- and 26-week follow-ups; (7) measure patient satisfaction, adverse events, antidepressant treatment, secondary care contacts, service use, and costs over 26 weeks’ follow-up, and perform cost-effectiveness and cost–utility analyses; and (8) carry out a qualitative process analysis to explore participants’ reflections on the use of the Patient Health Questionnaire-9 and the potential for implementing it in practice. Methods The study design was a parallel-group, cluster-randomised superiority trial with 1 : 1 allocation to intervention and control arms. The setting was UK primary care (141 group general practices in England and Wales). Inclusion criteria were age ≥ 18 years with a new episode of depressive disorder or symptoms. Patients were recruited mainly through regular medical records searches but also opportunistically at consultations for new episodes of depression. Exclusion criteria were current treatment for depression; dementia; psychosis; substance misuse; or a significant risk of suicide. The intervention was administration of the Patient Health Questionnaire-9 questionnaire as a PROM soon after diagnosis and at follow-up 10–35 days later. Patients were given written feedback on their Patient Health Questionnaire-9 scores and potential treatments to discuss with their general practitioners. Practitioners were trained in interpreting Patient Health Questionnaire-9 scores and taking them into account in treatment decisions. The primary outcome was depressive symptoms on the Beck Depression Inventory, 2nd edition, at 12 weeks. Secondary outcomes were Beck Depression Inventory, 2nd edition, scores at 26 weeks; social functioning (on the Work and Social Adjustment Scale) and quality of life (on the EuroQol-5 Dimensions, five-level) at 12 and 26 weeks; service use including antidepressant treatment and primary and secondary care contacts over 26 weeks to calculate NHS costs; and patient satisfaction at 26 weeks (on the Medical Informant Satisfaction Scale). For our sample size calculation, we assumed a baseline mean Beck Depression Inventory, 2nd edition, score of 24.0 with a standard deviation of 10.0 (derived from a feasibility study), and mean scores at 12-week follow-up of 14.0 in the intervention arm and 17.0 in the control arm. The anticipated difference of 3.0 points (effect size of 0.3) represented the minimum clinically important difference on the Beck Depression Inventory, 2nd edition. At the 5% level of significance, to have 90% power to detect that difference we calculated we needed 235 patients analysed per arm. We aimed to recruit a mean of six patients per practice and assumed an intracluster correlation coefficient of 0.03 (from the feasibility study), which gave a cluster design effect of 1.15, meaning we needed 270 per arm. We assumed a 20% loss to follow-up at 12 weeks, so the total sample size needed was 270 × 2/0.8 and our original target sample size was a total of 676 patients recruited, from 113 practices, by three recruitment centres (the University of Southampton, the University of Liverpool and University College London). We subsequently revised the target sample size on finding a significant correlation coefficient of > 0.5 between baseline and follow-up values for the primary outcome, which meant that we needed only 222 patients analysed per arm and, therefore, a target sample size of 554 patients recruited (revised 10 June 2021). Cluster randomisation of practices to intervention and control arms was carried out remotely by a Clinical Trials Unit statistician using computerised sequence generation, with minimisation by recruiting centre, size of practice and urban or rural location. Blinding of participating practitioners and patients to allocation was impossible given the nature of the intervention and the cluster-randomised design, but self-report outcome measures were used to prevent researcher rating bias, and statistical analysis was blind to allocation. Differences between intervention and control arms in the outcomes of depressive symptoms, social functioning and quality of life measured at 12- and 26-week follow-up were analysed using linear mixed models, adjusting for baseline depression; duration of depression; history of depression; baseline anxiety; sociodemographic factors (gender, age, socioeconomic position, housing, education, marital status and dependants), and clustering including a random effect for practice. Patient satisfaction, quality of life (quality-adjusted life-years) and costs were compared between the arms over the 26 weeks’ study follow-up period. Differences between the arms in the process of care for depression were also analysed, including patients’ self-reported use of antidepressants at the 12- and 26-week follow-up points, and medication and contacts with mental health services (community mental health nurses, counsellors, psychologists, psychiatrists, other therapists and social workers) recorded in practice medical records over the 26 weeks’ follow-up. A health economic evaluation was undertaken from an NHS and Personal Social Services perspective. The outcomes were expressed as incremental cost per point improvement in the Beck Depression Inventory, 2nd edition, clinical outcome (cost-effectiveness analysis), and incremental cost per quality-adjusted life-year gained (cost–utility analysis). The primary analysis at 26 weeks used a generalised linear mixed model to estimate the differences in costs and quality-adjusted life-years (using the EuroQol-5 Dimensions, five-level to calculate patient utilities), adjusted for baseline quality of life; baseline anxiety; sociodemographic factors; and practice as a random effect. Incremental cost-effectiveness ratios and a cost-effectiveness acceptability curve were generated using non-parametric bootstrapping. Qualitative interviews with participating practitioners and patients in both arms were conducted to reflect on their involvement in the trial and analysed using reflexive thematic analysis. Intervention arm participants were asked about barriers, facilitators, benefits and problems related to using the Patient Health Questionnaire-9, including questions derived from the normalisation process theory framework. Results Practices and patients As the number of patients recruited per practice was smaller than anticipated, we recruited significantly more than our target of 113 practices, eventually reaching a total of 189, but 48 practices subsequently withdrew (24 in each arm), so the final number of active practices was 141: 72 intervention and 69 control (28 above our original target). Practice characteristics were well balanced by arm. Of 11,468 patients approached in consultations or through mailed invitations, 1058 (9.2%) returned reply slips about the study: 574 (10.6% of those approached) in the intervention arm and 484 (8.0% of those approached) in the control arm. After the exclusion of patients declining to participate, ineligible at screening or uncontactable, 529 patients were assessed at baseline: 302 (5.5% of those approached) in the intervention arm and 227 (3.8% of those approached) in the control arm. The ratio of intervention to control arm patients recruited was, therefore, 1.3 to 1, which may have reflected lower motivation to take part among control arm practices. Of 529 patients recruited, 453 (85.6%) were followed up at 12 weeks: 254 intervention arm (84.1%) and 199 control arm (87.7%) patients. At the 26-week point, 414 patients (78.3%) were followed up: 230 intervention arm (76.2%) and 184 control arm (81.1%). Medical records data were collected for 259 intervention arm patients (85.8%) and 201 control arm patients (88.5%). The mean BDI-II score for depressive symptoms at baseline was higher in the intervention arm, at 24.1 (standard deviation 8.89) than in the control arm, at 22.4 (standard deviation 9.52). Baseline anxiety and quality-of-life scores were also worse in the intervention arm. Control arm patients were more likely to have had two or more previous depressive episodes. Demographic characteristics were relatively well balanced. Clinical outcomes At the 12-week follow-up, the mean Beck Depression Inventory, 2nd edition, score was 18.5 (standard deviation 10.2) in the intervention arm and 16.9 (standard deviation 10.3) in the control arm. The adjusted mean score was slightly lower in the intervention arm, but this was not statistically significant (mean adjusted difference –0.46, 95% confidence interval –2.16 to 1.26; p = 0.60). At 26 weeks, the mean Beck Depression Inventory, 2nd edition, scores were 15.1 (standard deviation 10.8) in the intervention arm and 14.7 (standard deviation 10.6) in the control arm (mean adjusted difference –1.63, 95% confidence interval –3.48 to 0.21; p = 0.08). Social functioning on the Work and Social Adjustment Scale and Medical Informant Satisfaction Scale satisfaction with care scores favoured the intervention, but the differences found were not statistically significant. A post hoc analysis at 26 weeks showed similar proportions improving by ≥ 50% on the Beck Depression Inventory, 2nd edition, in the intervention and control arms (45.1% vs. 37.3%), but the proportion remitting to a score of < 13 was significantly greater in the intervention arm (49.8% vs. 39.9%; adjusted odds ratio 2.18, 95% confidence interval 1.12 to 4.24; p = 0.02). Process of care In the intervention arm, 190 out of 261 patients (72.8%) had Patient Health Questionnaire-9 results in their medical records. In the control arm, 35 out of 201 patients (7.4%) had Patient Health Questionnaire-9 results recorded. More patients in the intervention arm had antidepressant prescriptions recorded in their medical records over the 26 weeks’ follow-up (67.4% vs. 55.7%), but the adjusted difference was not statistically significant. There was also no significant adjusted difference found in the proportions with mental health or social services contacts over the 26 weeks (34.6% vs. 33.8%, respectively). Health economic outcomes The adjusted mean difference in utility score between the arms was not statistically significant at the 12-week follow-up, but a statistically significant difference favouring the intervention arm was found at 26 weeks (0.053, 95% confidence interval 0.093 to 0.013; p = 0.01). However, quality-adjusted life-years over 26 weeks were not significantly greater (adjusted mean difference 0.0013, 95% confidence interval –0.0157 to 0.0182). Costs were lower in the intervention arm, but again not significantly (adjusted mean difference –£163, 95% confidence interval –£349 to £28). Cost-effectiveness and cost–utility analyses therefore suggested that the intervention was dominant over usual care, but with considerable uncertainty around the point estimates. The cost-effectiveness acceptability curve showed the probabilities of the intervention being cost-effective compared with usual care, at societal willingness-to-pay thresholds of £20,000 and £30,000 per quality-adjusted life-year, were 77% and 72%, respectively. Qualitative interviews Practitioners and patients interviewed described various benefits of using the Patient Health Questionnaire-9, including the provision of information about the range of symptoms and severity categories of depression; highlighting particular symptoms, including suicidal thoughts; identifying changes in mood over time; and informing treatment plans. However, a number of practitioners stated that their own clinical judgement was more important in making management decisions. Some patients described the Patient Health Questionnaire-9 as oversimplifying their complex experiences of depression; and some practitioners did not like the rigidity of the severity categories and their associated suggested treatments, which they referred to as ‘tick-box medicine’. Several practitioners expressed resistance to using the Patient Health Questionnaire-9 for these reasons, although some suggested it could be used as a guide for general practitioners with less experience. Barriers to using the questionnaire in routine practice included the time taken up in the consultation, which practitioners considered could be reduced if administering the questionnaire were automated through technological integration into practice communication and computerised records systems. Practitioners wanted an evidence base that the questionnaire was effective; clearer guidance on what to do depending on patient scores; and remuneration for the extra time taken in consultations. Limitations Baseline differences in depression, anxiety and quality-of-life scores may have reflected selection bias due to the cluster randomised design. We did not quite achieve the revised sample size target of 554 patients, falling short by 25, but the follow-up rate of 84.5% was better than the 80% predicted and so we gathered primary outcome data on 447 patients, exceeding the target of 444 and sufficient to answer the main research question with precision. It was not possible to blind participants and researchers to allocation to intervention or control arms given the pragmatic open and cluster randomised design, but we used self-report measures to avoid observer bias, and the analyses were carried out blind to allocation. We endeavoured to carry out the baseline assessments and administer the first Patient Health Questionnaire-9 as soon as possible after the patients first presented symptoms, but this was sometimes delayed by 2–3 weeks. In the meantime, treatment had already been started by the general practitioner/nurse practitioner of around half of the patients, which meant that the first Patient Health Questionnaire-9 score could not be taken into account when choosing initial treatments. Conclusions We found no benefit from using the Patient Health Questionnaire-9 in relation to the primary outcome of depression on the Beck Depression Inventory, 2nd edition, at the 12-week follow-up. There were also no significant differences found between the arms in the secondary outcomes of Beck Depression Inventory, 2nd edition, scores at 26 weeks, work and social functioning, patient satisfaction, medication use, or contacts with mental health services, although all the differences found in these measures were in the direction of favouring the intervention arm. However, we did find a significant benefit in terms of improved quality of life at 26 weeks, for similar overall service costs. We also found evidence of benefit in a categorical analysis comparing rates of remission of depression at 26 weeks, although this result should be treated with caution as it was from a post hoc analysis. Cost-effectiveness acceptability curves showed that the probability of the intervention being cost-effective, at the lower and higher thresholds adopted by the National Institute for Health and Care Excellence of £20,000 and £30,000 per quality-adjusted life-year, was 77% and 72%, respectively. We found that patients valued using the Patient Health Questionnaire-9 to identify changes in their scores. The mechanism by which feedback of scores might improve patients’ quality of life, despite not changing the management of their depression, might be through increasing their awareness of improvement in their symptoms over time, supporting personal reflection on their progress to recovery. Future work Further research should be conducted in primary care evaluating (1) longer patient-reported outcome measures including anxiety symptoms, (2) administered remotely before and between consultations, with (3) algorithm-driven interpretation, delivering recommendations for changes in treatment. Study registration This study is registered as IRAS250225 and ISRCTN17299295. Funding This award was funded by the National Institute for Health and Care Research (NIHR) Health Technology Assessment programme (NIHR award ref: 17/42/02) and is published in full in Health Technology Assessment; Vol. 28, No. 17. See the NIHR Funding and Awards website for further award information.
Keywords