GBM-Reservoir: Brain tumor (Glioblastoma Multiforme) MRI dataset collection with ground truth segmentation masksfigshare
Naida Solak,
André Ferreira,
Gijs Luijten,
Behrus Puladi,
Victor Alves,
Jan Egger
Affiliations
Naida Solak
Graz University of Technology (TU Graz), Graz, Styria, Austria; Computer Algorithms for Medicine Laboratory (Café Lab), Graz, Styria, Austria; Institute for AI in Medicine (IKIM), University Hospital Essen (UKE), Ruhrgebiet, Essen, Germany
André Ferreira
Institute for AI in Medicine (IKIM), University Hospital Essen (UKE), Ruhrgebiet, Essen, Germany; Center Algoritmi / LASI, University of Minho, Braga, Portugal; Institute of Medical Informatics, University Hospital RWTH Aachen, Aachen, Germany; Department of Oral and Maxillofacial Surgery, University Hospital RWTH Aachen, Aachen, Germany
Gijs Luijten
Graz University of Technology (TU Graz), Graz, Styria, Austria; Computer Algorithms for Medicine Laboratory (Café Lab), Graz, Styria, Austria; Institute for AI in Medicine (IKIM), University Hospital Essen (UKE), Ruhrgebiet, Essen, Germany; Center for Virtual and Extended Reality in Medicine, University Medicine Essen, Essen, Germany
Behrus Puladi
Institute of Medical Informatics, University Hospital RWTH Aachen, Aachen, Germany; Department of Oral and Maxillofacial Surgery, University Hospital RWTH Aachen, Aachen, Germany
Victor Alves
Center Algoritmi / LASI, University of Minho, Braga, Portugal
Jan Egger
Graz University of Technology (TU Graz), Graz, Styria, Austria; Computer Algorithms for Medicine Laboratory (Café Lab), Graz, Styria, Austria; Institute for AI in Medicine (IKIM), University Hospital Essen (UKE), Ruhrgebiet, Essen, Germany; Center for Virtual and Extended Reality in Medicine, University Medicine Essen, Essen, Germany; Corresponding author.
In this article, we present a brain tumor database collection comprising 23,049 samples, with each sample including four different types of MRI brain scans: FLAIR, T1, T1ce, and T2. Additionally, one or two segmentation masks (ground truth) are provided for each sample. The first mask is the raw output from the registration process and is provided for all samples, while the second mask, provided particularly for synthetic samples, is a post-processed version of the first, designed to simplify interpretation and optimize it for network training. These samples have been acquired via registration process of 438 samples available at the moment of registration from the original dataset provided by the BraTS 2022 Challenge. Registering each pair of existing brain scans results in two additional scans that retain a similar brain shape while featuring varying tumor locations. Consequently, by registering all possible pairs, a dataset originally consisting of n samples can be expanded to n2 samples. The original dataset was collected from different institutions under standard clinical conditions, but with different equipment and imaging protocols. As a result, the image quality is heterogeneous, reflecting the diversity of clinical practices across institutions. This dataset can be utilized for various tasks, such as developing fully automated segmentation algorithms for new, unseen brain tumor cases, particularly through deep learning-based approaches, since ground truth is provided for each sample.