NDB-UFES: An oral cancer and leukoplakia dataset composed of histopathological images and patient data
Maria Clara Falcão Ribeiro-de-Assis,
Júlia Pessini Soares,
Leandro Muniz de Lima,
Liliana Aparecida Pimenta de Barros,
Tânia Regina Grão-Velloso,
Renato A. Krohling,
Danielle Resende Camisasca
Affiliations
Maria Clara Falcão Ribeiro-de-Assis
School of Dentistry, Clinical Dentistry Departament, Federal University of Espirito Santo, Vitoria, Brazil
Júlia Pessini Soares
School of Dentistry, Clinical Dentistry Departament, Federal University of Espirito Santo, Vitoria, Brazil
Leandro Muniz de Lima
Nature-inspired Computing Lab, Federal University of Espirito Santo, Vitoria, Brazil; Graduate Program in Computer Science, Federal University of Espirito Santo, Vitoria, Brazil
Liliana Aparecida Pimenta de Barros
School of Dentistry, Clinical Dentistry Departament, Federal University of Espirito Santo, Vitoria, Brazil; Graduate Program in Science Dentistry, Federal University of Espirito Santo, Vitoria, Brazil
Tânia Regina Grão-Velloso
School of Dentistry, Clinical Dentistry Departament, Federal University of Espirito Santo, Vitoria, Brazil; Graduate Program in Science Dentistry, Federal University of Espirito Santo, Vitoria, Brazil
Renato A. Krohling
Nature-inspired Computing Lab, Federal University of Espirito Santo, Vitoria, Brazil; Graduate Program in Computer Science, Federal University of Espirito Santo, Vitoria, Brazil
Danielle Resende Camisasca
School of Dentistry, Clinical Dentistry Departament, Federal University of Espirito Santo, Vitoria, Brazil; Graduate Program in Science Dentistry, Federal University of Espirito Santo, Vitoria, Brazil; Corresponding author at: IOUFES – Ambulatório 4 - Patologia Oral, Avenida Marechal Campos, 1.355, Bairro Santos Dumont, Vitória – ES. CEP: 29042-715
The gold standard for the diagnosis of oral cancer is the microscopic analysis of specimens removed preferentially through incisional biopsies of oral mucosa with a clinically detected suspicious lesion. This dataset contains captured histopathological images of oral squamous cell carcinoma and leukoplakia. A total of 237 images were captured, 89 leukoplakia with dysplasia images, 57 leukoplakia without dysplasia images and 91 carcinoma images. The images were captured with an optical light microscope, using 10x and 40x objectives, attached to a microscope camera and visualized through a software. The images were saved in PNG format at 2048 × 1536 size pixels and they refer to hematoxylin-eosin stained histopathologic slides from biopsies performed between 2010 and 2021 in patients managed at the Oral Diagnosis project (NDB) of the Federal University of Espírito Santo (UFES). Oral leukoplakias were represented by samples with and without epithelial dysplasia. Since the diagnosis considers socio-demographic data (gender, age and skin color) as well as clinical data (tobacco use, alcohol consumption, sun exposure, fundamental lesion, type of biopsy, lesion color, lesion surface and lesion diagnosis), this information was also collected. So, our aim by releasing this dataset NDB-UFES is to provide a new dataset to be used by researchers in Artificial Intelligence (machine and deep learning) to develop tools to assist clinicians and pathologists in the automated diagnosis of oral potentially malignant disorders and oral squamous cell carcinoma.