Nursing Reports (Jun 2025)

Semantic Evaluation of Nursing Assessment Scales Translations by ChatGPT 4.0: A Lexicometric Analysis

  • Mauro Parozzi,
  • Mattia Bozzetti,
  • Alessio Lo Cascio,
  • Daniele Napolitano,
  • Roberta Pendoni,
  • Ilaria Marcomini,
  • Giovanni Cangelosi,
  • Stefano Mancin,
  • Antonio Bonacaro

DOI
https://doi.org/10.3390/nursrep15060211
Journal volume & issue
Vol. 15, no. 6
p. 211

Abstract

Read online

Background/Objectives: The use of standardized assessment tools within the nursing care process is a globally established practice, widely recognized as a foundation for evidence-based evaluation. Accurate translation is essential to ensure their correct and consistent clinical use. While effective, traditional procedures are time-consuming and resource-intensive, leading to increasing interest in whether artificial intelligence can assist or streamline this process for nursing researchers. Therefore, this study aimed to assess the translation’s quality of nursing assessment scales performed by ChatGPT 4.0. Methods: A total of 31 nursing rating scales with 772 items were translated from English to Italian using two different prompts, and then underwent a deep lexicometric analysis. To assess the semantic accuracy of the translations the Sentence-BERT, Jaccard similarity, TF-IDF cosine similarity, and Overlap ratio were used. Sensitivity, specificity, AUC, and AUROC were calculated to assess the quality of the translation classification. Paired-sample t-tests were conducted to compare the similarity scores. Results: The Maastricht prompt produced translations that are marginally but consistently more semantically and lexically faithful to the original. While all differences were found to be statistically significant, the corresponding effect sizes indicate that the advantage of the Maastricht prompt is slight but consistent across all measures. The sensitivity of the prompts was 0.929 (92.9%) for York and 0.932 (93.2%) for Maastricht. Specificity and precision remained for both at 1.000. Conclusions: Findings highlight the potential of prompt engineering as a low-cost, effective method to enhance translation outcomes. Nonetheless, as translation represents only a preliminary step in the full validation process, further studies should investigate the integration of AI-assisted translation within the broader framework of instrument adaptation and validation.

Keywords