Статистика и экономика (May 2024)

Intelligent Data Processing Methods for Studying the Influence of the Environment on the Morbidity of the Population in Moscow

  • T. V. Zolotova,
  • A. S. Marunko

DOI
https://doi.org/10.21686/2500-3925-2024-2-72-82
Journal volume & issue
Vol. 21, no. 2
pp. 72 – 82

Abstract

Read online

Purpose of the study. The purpose of the study is to confirm or refute the environmental determinism of the occurrence of socially significant diseases among the population of Moscow based on the analysis of data on environmental and health indexes in the context of municipal units of the city.Materials and methods. The article analyzes Russian and foreign bibliography on the research problem. Based on collected and processed open data on environmental indexes and population morbidity in various districts of Moscow, various types of analysis were carried out to identify the relationship between these data. To classify socially significant diseases based on environmental indexes of the place of residence, machine learning models were designed. The mathematical basis of machine learning methods is the k-nearest neighbors’ method, multilayer perceptron, and gradient boosting. To create the models, the Jupyter Notebook software tool, which supports the Python programming language, was used.Results. Correlation and regression analysis showed that there is a statistically significant correlation between some selected environmental indexes and the occurrence of socially significant diseases. This result indicates a possible relationship, which is one of the main conclusions of this paper. A web interface has been developed to automate the analysis of new data using constructed machine learning models used to conduct regression analysis to create a binary logistic model (prediction based on collected data of people with socially significant diseases) and a multiclass classification models (prediction based on collected data, which it is the disease that can be detected in a person). The machine learning models used were analyzed and the best model for classifying socially significant diseases was determined.Conclusion. As a result of the study, it was possible to collect comprehensive information about various environmental indexes and the presence or absence of various objects that have an impact on the environment. These data were used not only in machine learning models, but also to form an objective assessment of the environmental situation of municipal units of Moscow city. Since automatic updating of the rating for dynamic data was implemented, this result can be used by ordinary users who do not have sufficient qualifications in ecology and medicine for independent analysis of the ecological state of areas. We believe that such research will certainly lead to effective practical solutions in this area.

Keywords