Mendelian randomization with incomplete measurements on the exposure in the Hispanic Community Health Study/Study of Latinos
Yilun Li,
Kin Yau Wong,
Annie Green Howard,
Penny Gordon-Larsen,
Heather M. Highland,
Mariaelisa Graff,
Kari E. North,
Carolina G. Downie,
Christy L. Avery,
Bing Yu,
Kristin L. Young,
Victoria L. Buchanan,
Robert Kaplan,
Lifang Hou,
Brian Thomas Joyce,
Qibin Qi,
Tamar Sofer,
Jee-Young Moon,
Dan-Yu Lin
Affiliations
Yilun Li
Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Kin Yau Wong
Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
Annie Green Howard
Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Penny Gordon-Larsen
Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Nutrition, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Heather M. Highland
Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Mariaelisa Graff
Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Kari E. North
Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Carolina G. Downie
Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Christy L. Avery
Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Bing Yu
Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
Kristin L. Young
Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Victoria L. Buchanan
Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Robert Kaplan
Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY 10461, USA; Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
Lifang Hou
Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
Brian Thomas Joyce
Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
Qibin Qi
Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY 10461, USA
Tamar Sofer
Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
Jee-Young Moon
Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY 10461, USA
Dan-Yu Lin
Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Corresponding author
Summary: Mendelian randomization has been widely used to assess the causal effect of a heritable exposure variable on an outcome of interest, using genetic variants as instrumental variables. In practice, data on the exposure variable can be incomplete due to high cost of measurement and technical limits of detection. In this paper, we propose a valid and efficient method to handle both unmeasured and undetectable values of the exposure variable in one-sample Mendelian randomization analysis with individual-level data. We estimate the causal effect of the exposure variable on the outcome using maximum likelihood estimation and develop an expectation maximization algorithm for the computation of the estimator. Simulation studies show that the proposed method performs well in making inference on the causal effect. We apply our method to the Hispanic Community Health Study/Study of Latinos, a community-based prospective cohort study, and estimate the causal effect of several metabolites on phenotypes of interest.