Abstract Background Escherichia coli is an opportunistic pathogen which colonizes various host species. However, to what extent genetic lineages of E. coli are adapted or restricted to specific hosts and the genomic determinants of such adaptation or restriction is poorly understood. Results We randomly sampled E. coli isolates from four countries (Germany, UK, Spain, and Vietnam), obtained from five host species (human, pig, cattle, chicken, and wild boar) over 16 years, from both healthy and diseased hosts, to construct a collection of 1198 whole-genome sequenced E. coli isolates. We identified associations between specific E. coli lineages and the host from which they were isolated. A genome-wide association study (GWAS) identified several E. coli genes that were associated with human, cattle, or chicken hosts, whereas no genes associated with the pig host could be found. In silico characterization of nine contiguous genes (collectively designated as nan-9) associated with the human host indicated that these genes are involved in the metabolism of sialic acids (Sia). In contrast, the previously described sialic acid regulon known as sialoregulon (i.e. nanRATEK-yhcH, nanXY, and nanCMS) was not associated with any host species. In vitro growth experiments with a Δnan-9 E. coli mutant strain, using the sialic acids 5-N-acetylneuraminic acid (Neu5Ac) and N-glycolylneuraminic acid (Neu5Gc) as sole carbon source, showed impaired growth behaviour compared to the wild-type. Conclusions This study provides an extensive analysis of genetic determinants which may contribute to host specificity in E. coli. Our findings should inform risk analysis and epidemiological monitoring of (antimicrobial resistant) E. coli.