IEEE Access (Jan 2017)

Quantitative and Qualitative Evaluation of Sequence Patterns Found by Application of Different Educational Data Preprocessing Techniques

  • Michal Munk,
  • Martin Drlik,
  • L'ubomir Benko,
  • Jaroslav Reichel

DOI
https://doi.org/10.1109/ACCESS.2017.2706302
Journal volume & issue
Vol. 5
pp. 8989 – 9004

Abstract

Read online

Educational data preprocessing from log files represents a time-consuming phase of the knowledge discovery process. It consists of data cleaning, user identification, session identification, and path completion phase. This paper attempts to identify phases, which are necessary in the case of preprocessing of educational data for further application of learning analytics methods. Since the sequential patterns analysis is considered suitable for estimating of discovered knowledge, this paper tries answering the question, which of these preprocessing phases has a significant impact on discovered knowledge in general, as well as in the meaning of quality and quantity of found sequence patterns. Therefore, several data preprocessing techniques for session identification and path completion were applied to prepare log files with different levels of data preprocessing. The results showed that the session identification technique using the reference length, calculated from the sitemap, had a significant impact on the quality of extracted sequence rules. The path completion technique had a significant impact only on the quantity of extracted sequence rules. The found results together with the results of the previous systematic research in educational data preprocessing can improve the automation of the educational data preprocessing phase as well as it can contribute to the development of learning analytics tools suitable for different groups of stakeholders engaged in the educational data mining research activities.

Keywords