Data in Brief (Apr 2024)
A dataset of optical spectra and clinical features acquired on human healthy skin and on skin carcinomas
Abstract
Optical spectroscopy is studied to contribute to skin cancer diagnosis. Indeed, optical spectra are modified along cancer progression and provide complementary information (e.g., on metabolism and tissue structure) to clinical examination for surgical guidance [1,2]. The current original dataset is made of autofluorescence and diffuse reflectance spectra acquired in vivo on 131 patients’ skin with the SpectroLive device [3,4]. Spatially-resolved spectroscopy measurements were performed using a multi-fiber optic probe featuring 4 distances (0.4–1 mm) between excitation and collection optical fibers: spatial resolution allows spectra acquired at different distances to carry information from different depths in skin tissues. Five types of autofluorescence spectra were acquired using five different wavelength excitations (on the 365–415 nm spectral range) in order to collect information on several skin endogenous fluorophores (e.g., flavins, collagen). A sixth light source (white broadband) was used to acquire diffuse reflectance spectra carrying information about skin scattering properties and skin endogenous absorbers such as melanin and hemoglobin. Patients were proposed to be included into the clinical trial if they were suspected of suffering from actinic keratoses (precancerous skin lesions) or from basal or squamous cell carcinomas: in all cases, complete diagnostics is provided in the dataset. To increase the interest of the dataset and evaluate the dependence of optical spectra (intensity, shape) not only on pathological states but also on healthy skin features (civil age, skin age, gender, phototype, anatomical site), spectra were acquired for all 131 patients on two so-called “reference” skin sites known to rarely suffer from skin cancer: palm of the hand (featuring a thick skin type) and inner wrist (featuring thin skin). Spectra are available in .tab files: first column displays the spectral range on which intensity spectra were recorded (317–788 nm) and each following column provides an intensity spectrum acquired by each spectrometer for a given combination of light source excitation and distance. Each of the 131 folders corresponding to each of the 131 patients contains a .json file providing patients clinical features: gender, civil age, skin age, phototype score and class. All .tab files names include anatomical site and anatomopathological diagnostics of the skin site on which spectra were acquired: codes were defined to match a letter or an acronym to each diagnostic and anatomical site. To ensure quality control, a spectrum was acquired on the same calibration standard before starting spectra acquisition on each patient. It is therefore possible to follow the impact of the acquisition optical chain ageing during the 4.5 years that the patients were included. This dataset can be used by epidemiologists for the characterization of populations affected by skin cancers (gender ratio, mean age, anatomical sites typically affected, etc.); it may also be used by researchers in artificial intelligence to develop innovative methods to process such data and contribute to non-invasive diagnostics of skin cancers whose incidence is steadily increasing.