Метрологія та прилади (Dec 2018)

Identification of Distribution Laws Using the Correlation Coefficient Using Python

  • D. Losikhin,
  • O. Oliynyk,
  • O. Chorna,
  • О. Gnatko

DOI
https://doi.org/10.33955/2307-2180(6)2018.36-38
Journal volume & issue
no. 6
pp. 36 – 38

Abstract

Read online

The article is devoted to the development of a new method for identifying the distribution laws when evaluating the results of multiple measurements. The identification of the distribution laws is today an urgent metrological task, since the adopted restrictions on the number of measurements and assumptions about the distribution law of random error may introduce additional uncertainty in the assessment of the measurement result. The use of well-known classical approaches to the identification of distribution laws is associated with a number of difficulties associated with the need to use the completeness of the considered set of models and the correct application of the corresponding statistical methods. The main limitation associated with the use of classical approaches to the identification of distribution laws is that they are designed for use in data processing systems based on Gaussian distribution (normal) and, thus, are not universal. The imperfection of mathematical models of processing measurement information leads to the possible erroneous identification of the distribution law. The paper proposes a method for identifying the distribution laws for data outside the Gaussian distribution region. The model is based on the calculation of correlation coefficients for data with different distribution laws. The correlation coefficient is used to estimate the proximity of probability density functions and is calculated for pairs of different probability densities represented by histograms in a multidimensional vector space on an orthonormal basis of unit sampling intervals. Based on the obtained matrix of the values ​​of the correlation coefficients, a classification estimate of the unknown distribution laws is performed based on the experimental data of the simulated samples. A listing of the software implementation of the model in the Python software environment is given.

Keywords