Background: Meta-analysis is a widely used tool in which weighted information from multiple similar studies is aggregated to increase statistical power. However, the exponential growth of publications in key areas of medical science has rendered manual identification of relevant studies increasingly time-consuming. The aim of this work was to develop a machine learning technique capable of robust automatic study selection for meta-analysis. We have validated this approach with an up-to-date meta-analysis to investigate the association between diabetes mellitus (DM) and new-onset atrial fibrillation (AF).Methods: The PubMed online database was searched from 1960 to September 2017 where 4,177 publications that mentioned both DM and AF were identified. Relevant studies were selected as follows. First, publications were clustered based on common text features using an unsupervised K-means algorithm. Clusters that best matched the selected set of potentially relevant studies (a “training” set of 139 articles) were then identified by using maximum entropy classification. The 139 articles selected automatically on this basis were screened manually to identify potentially relevant studies. To determine the validity of the automated process, a parallel set of studies was also assembled by manually screening all initially searched publications. Finally, detailed manual selection was performed on the full texts of the studies in both sets using standard criteria. Quality assessment, meta-regression random-effects models, sensitivity analysis and publication bias assessment were then conducted.Results: Machine learning-assisted screening identified the same 29 studies for meta-analysis as those identified by using manual screening alone. Machine learning enabled more robust and efficient study selection, reducing the number of studies needed for manual screening from 4,177 to 556 articles. A pooled analysis using the most conservative estimates indicated that patients with DM had ~49% greater risk of developing AF compared with individuals without DM. After adjusting for three additional risk factors i.e., hypertension, obesity and heart disease, the relative risk was 23%. Using multivariate adjusted models, the risk for developing AF in patients with DM was similar for all DM subtypes. Women with DM were 24% more likely to develop AF than men with DM. The risk for new-onset AF in patients with DM has also increased over the years.Conclusions: We have developed a novel machine learning method to identify publications suitable for inclusion in meta-analysis.This approach has the capacity to provide for a more efficient and more objective study selection process for future such studies. We have used it to demonstrate that DM is a strong, independent risk factor for AF, particularly for women.