Comparison and Evaluation of Different Methods for the Feature Extraction from Educational Contents

Jose Aguilar; Camilo Salazar; Henry Velasco; Julian Monsalve-Pulido; Edwin Montoya

doi:10.3390/computation8020030

Computation (Apr 2020)

Comparison and Evaluation of Different Methods for the Feature Extraction from Educational Contents

Jose Aguilar,
Camilo Salazar,
Henry Velasco,
Julian Monsalve-Pulido,
Edwin Montoya

Affiliations

Jose Aguilar: Escuela de Sistemas, Facultad de Ingeniería, Universidad de los Andes, Mérida 5101, Venezuela
Camilo Salazar: GIDITIC, Universidad EAFIT, Carrera 49 No. 7 Sur 50, Medellin 050001, Colombia
Henry Velasco: LANTIA SAS, Medellin 050001, Colombia
Julian Monsalve-Pulido: GIDITIC, Universidad EAFIT, Carrera 49 No. 7 Sur 50, Medellin 050001, Colombia
Edwin Montoya: GIDITIC, Universidad EAFIT, Carrera 49 No. 7 Sur 50, Medellin 050001, Colombia

DOI: https://doi.org/10.3390/computation8020030
Journal volume & issue: Vol. 8, no. 2
p. 30

Abstract

Read online

This paper analyses the capabilities of different techniques to build a semantic representation of educational digital resources. Educational digital resources are modeled using the Learning Object Metadata (LOM) standard, and these semantic representations can be obtained from different LOM fields, like the title, description, among others, in order to extract the features/characteristics from the digital resources. The feature extraction methods used in this paper are the Best Matching 25 (BM25), the Latent Semantic Analysis (LSA), Doc2Vec, and the Latent Dirichlet allocation (LDA). The utilization of the features/descriptors generated by them are tested in three types of educational digital resources (scientific publications, learning objects, patents), a paraphrase corpus and two use cases: in an information retrieval context and in an educational recommendation system. For this analysis are used unsupervised metrics to determine the feature quality proposed by each one, which are two similarity functions and the entropy. In addition, the paper presents tests of the techniques for the classification of paraphrases. The experiments show that according to the type of content and metric, the performance of the feature extraction methods is very different; in some cases are better than the others, and in other cases is the inverse.

Published in Computation

ISSN: 2079-3197 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.mdpi.com/journal/computation

About the journal

Abstract

Keywords