Efficient labeling of french mammogram reports with MammoBERT

Nazanin Dehghani; Vera Saliba-Colombani; Aurélien Chick; Morgane Heng; Grégory Operto; Pierre Fillard

doi:10.1038/s41598-024-76369-y

Scientific Reports (Oct 2024)

Efficient labeling of french mammogram reports with MammoBERT

Nazanin Dehghani,
Vera Saliba-Colombani,
Aurélien Chick,
Morgane Heng,
Grégory Operto,
Pierre Fillard

Affiliations

Nazanin Dehghani: Therapixel Company, 1 Imp. Reille
Vera Saliba-Colombani: Therapixel Company, 1 Imp. Reille
Aurélien Chick: Therapixel Company, 1 Imp. Reille
Morgane Heng: Therapixel Company, 1 Imp. Reille
Grégory Operto: Therapixel Company, 1 Imp. Reille
Pierre Fillard: Therapixel Company, 1 Imp. Reille

DOI: https://doi.org/10.1038/s41598-024-76369-y
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Recent advances in deep learning and natural language processing (NLP) have broadened opportunities for automatic text processing in the medical field. However, the development of models for low-resource languages like French is challenged by limited datasets, often due to legal restrictions. Large-scale training of medical imaging models often requires extracting labels from radiology text reports. Current methods for report labeling primarily rely on sophisticated feature engineering based on medical domain knowledge or manual annotations by radiologists. These methods can be labor-intensive. In this work, we introduce a BERT-based approach for the efficient labeling of French mammogram image reports. Our method leverages both the expansive scale of existing rule-based systems and the precision of radiologist annotations. Our experimental results showcase the superiority of the proposed approach. It was initially fine-tuned on a limited dataset of radiologist annotations. Then, it underwent training on annotations generated by a rule-based labeler. Our findings reveal that our final model, MammoBERT, significantly outperforms the rule-based labeler while simultaneously reducing the necessity for radiologist annotations during training. This research not only advances the state of the art in medical image report labeling but also offers an efficient and effective solution for large-scale medical imaging model development.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal

Abstract

Keywords