O uso da mineração de textos no suporte a correções de questões discursivas em uma instituição de educação superior / The use of texts mining in the support to corrections of discursive questions in a higher education institution

Leonard Barreto Moreira; Annabell Del Real Tamariz; Joyce Vieira Fettermann

doi:10.17851/1983-3652.11.3.213-227

Texto Livre: Linguagem e Tecnologia (Dec 2018)

O uso da mineração de textos no suporte a correções de questões discursivas em uma instituição de educação superior / The use of texts mining in the support to corrections of discursive questions in a higher education institution

Leonard Barreto Moreira,
Annabell Del Real Tamariz,
Joyce Vieira Fettermann

Affiliations

Leonard Barreto Moreira: Universidade Estadual do Norte Fluminense
Annabell Del Real Tamariz: Universidade Estadual do Norte Fluminense
Joyce Vieira Fettermann: Universidade Estadual do Norte Fluminense

DOI: https://doi.org/10.17851/1983-3652.11.3.213-227
Journal volume & issue: Vol. 11, no. 3
pp. 213 – 227

Abstract

Read online

RESUMO: A presente pesquisa tem como objetivo principal o desenvolvimento de um modelo computacional com uso de técnicas de Mineração de Textos para a tarefa de correção de questões dissertativas em ambientes online, possibilitando, por sua vez, a diminuição da subjetividade na avaliação das questões discursivas dos discentes. O conjunto de dados utilizados para os experimentos baseia-se em 15 questões discursivas de computação pertencentes ao ciclo básico de cursos da área das Engenharias. A metodologia proposta é apoiada em três grandes fases: 1) Aplicação de técnicas de pré-processamento de textos e representação dos documentos segundo a abordagem “Saco de palavras”, com esquema de ponderação term-frequency; 2) Realização do processamento dos textos por meio da comparação dos termos contidos nas respostas com os do gabarito por intermédio de medidas baseadas em termos e edição; 3) Confrontação dos resultados numéricos obtidos com as notas da correção do avaliador, ao investigar a hipótese de que as médias das notas reais e estimadas são iguais por meio do Teste T, assim como análise do erro médio absoluto percentual (MAPE) entre tais subconjuntos. Os resultados obtidos indicaram uma alta aderência à hipótese de que as médias dos dados reais vs estimados são iguais, principalmente para as medidas baseadas em tokens. A acurácia foi da ordem de 84,2% para Coseno no modelo bigram. Assim, o principal resultado deste trabalho é a concepção de um modelo de MT para o apoio ao processo avaliativo de questões discursivas em ambiente EaD. PALAVRAS-CHAVE: aprendizado de máquina; mineração de texto; sistemas de ensino inteligentes. ABSTRACT: The present research has as main objective the computational development with the use of techniques of Texts Mining for the task of correcting the dissertative questions online, making it possible to provide the diminution of the subjectivity in the evaluation of the discursive questions of the students. The set of data used for the experiments is based on 15 discursive computational questions belonging to the basic course cycle of the Engineering area. The proposed methodology is supported by three major phases: 1) Application of pre-processing techniques and representation of documents according to the “Bag of words” approach, with term-frequency weighting scheme; 2) Carrying out the processing of texts by comparing the terms contained in the answers with those of the template by means of measures based on terms and editing; 3) Confrontation of the numerical results obtained with the evaluator's correction notes, investigating the hypothesis that the means of the real and estimated scores are equal by means of the T-Test, as well as analysis of the percentage absolute mean error (MAPE, in Portuguese) between such subsets. The results obtained indicated a high adherence to the hypothesis that the averages of the actual vs. estimated data are the same, especially for the tokens-based measurements. The accuracy was of the order of 84.2% for Cosine in the bigram model. Thus, the main result of this work is the design of a TM model to support the evaluation process of discursive issues in the distance learning environment. KEYWORDS: machine learning; text mining; smart education systems.

Published in Texto Livre: Linguagem e Tecnologia

ISSN: 1983-3652 (Online)
Publisher: Universidade Federal de Minas Gerais
Country of publisher: Brazil
LCC subjects: Technology; Language and Literature
Website: https://periodicos.ufmg.br/index.php/textolivre/

About the journal

Abstract

Keywords