Investigation of Super Learner Methodology on HIV-1 Small Sample: Application on Jaguar Trial Data

Allal Houssaïni; Lambert Assoumou; Anne Geneviève Marcelin; Jean Michel Molina; Vincent Calvez; Philippe Flandre

doi:10.1155/2012/478467

AIDS Research and Treatment (Jan 2012)

Investigation of Super Learner Methodology on HIV-1 Small Sample: Application on Jaguar Trial Data

Allal Houssaïni,
Lambert Assoumou,
Anne Geneviève Marcelin,
Jean Michel Molina,
Vincent Calvez,
Philippe Flandre

Affiliations

Allal Houssaïni: INSERM, UMR-S 943, 56 Boulevard Vincent Auriol, BP 335, 75625 Paris Cedex 13, France
Lambert Assoumou: INSERM, UMR-S 943, 56 Boulevard Vincent Auriol, BP 335, 75625 Paris Cedex 13, France
Anne Geneviève Marcelin: INSERM, UMR-S 943, 56 Boulevard Vincent Auriol, BP 335, 75625 Paris Cedex 13, France
Jean Michel Molina: Service des Maladies Infectieuses, Hôpital Saint Louis, AP-HP, Paris, France
Vincent Calvez: INSERM, UMR-S 943, 56 Boulevard Vincent Auriol, BP 335, 75625 Paris Cedex 13, France
Philippe Flandre: INSERM, UMR-S 943, 56 Boulevard Vincent Auriol, BP 335, 75625 Paris Cedex 13, France

DOI: https://doi.org/10.1155/2012/478467
Journal volume & issue: Vol. 2012

Abstract

Read online

Background. Many statistical models have been tested to predict phenotypic or virological response from genotypic data. A statistical framework called Super Learner has been introduced either to compare different methods/learners (discrete Super Learner) or to combine them in a Super Learner prediction method. Methods. The Jaguar trial is used to apply the Super Learner framework. The Jaguar study is an “add-on” trial comparing the efficacy of adding didanosine to an on-going failing regimen. Our aim was also to investigate the impact on the use of different cross-validation strategies and different loss functions. Four different repartitions between training set and validations set were tested through two loss functions. Six statistical methods were compared. We assess performance by evaluating R2 values and accuracy by calculating the rates of patients being correctly classified. Results. Our results indicated that the more recent Super Learner methodology of building a new predictor based on a weighted combination of different methods/learners provided good performance. A simple linear model provided similar results to those of this new predictor. Slight discrepancy arises between the two loss functions investigated, and slight difference arises also between results based on cross-validated risks and results from full dataset. The Super Learner methodology and linear model provided around 80% of patients correctly classified. The difference between the lower and higher rates is around 10 percent. The number of mutations retained in different learners also varys from one to 41. Conclusions. The more recent Super Learner methodology combining the prediction of many learners provided good performance on our small dataset.

Published in AIDS Research and Treatment

ISSN: 2090-1240 (Print); 2090-1259 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Internal medicine: Specialties of internal medicine: Immunologic diseases. Allergy
Website: https://onlinelibrary.wiley.com/journal/9351

About the journal