Abstract Background In the emergency room, clinicians spend a lot of time and are exposed to mental stress. In addition, fracture classification is important for determining the surgical method and restoring the patient's mobility. Recently, with the help of computers using artificial intelligence (AI) or machine learning (ML), diagnosis and classification of hip fractures can be performed easily and quickly. The purpose of this systematic review is to search for studies that diagnose and classify for hip fracture using AI or ML, organize the results of each study, analyze the usefulness of this technology and its future use value. Methods PubMed Central, OVID Medline, Cochrane Collaboration Library, Web of Science, EMBASE, and AHRQ databases were searched to identify relevant studies published up to June 2022 with English language restriction. The following search terms were used [All Fields] AND (", "[MeSH Terms] OR (""[All Fields] AND "bone"[All Fields]) OR "bone fractures"[All Fields] OR "fracture"[All Fields]). The following information was extracted from the included articles: authors, publication year, study period, type of image, type of fracture, number of patient or used images, fracture classification, reference diagnosis of fracture diagnosis and classification, and augments of each studies. In addition, AI name, CNN architecture type, ROI or important region labeling, data input proportion in training/validation/test, and diagnosis accuracy/AUC, classification accuracy/AUC of each studies were also extracted. Results In 14 finally included studies, the accuracy of diagnosis for hip fracture by AI was 79.3–98%, and the accuracy of fracture diagnosis in AI aided humans was 90.5–97.1. The accuracy of human fracture diagnosis was 77.5–93.5. AUC of fracture diagnosis by AI was 0.905–0.99. The accuracy of fracture classification by AI was 86–98.5 and AUC was 0.873–1.0. The forest plot represented that the mean AI diagnosis accuracy was 0.92, the mean AI diagnosis AUC was 0.969, the mean AI classification accuracy was 0.914, and the mean AI classification AUC was 0.933. Among the included studies, the architecture based on the GoogLeNet architectural model or the DenseNet architectural model was the most common with three each. Among the data input proportions, the study with the lowest training rate was 57%, and the study with the highest training rate was 95%. In 14 studies, 5 studies used Grad-CAM for highlight important regions. Conclusion We expected that our study may be helpful in making judgments about the use of AI in the diagnosis and classification of hip fractures. It is clear that AI is a tool that can help medical staff reduce the time and effort required for hip fracture diagnosis with high accuracy. Further studies are needed to determine what effect this causes in actual clinical situations.