Journal of King Saud University: Computer and Information Sciences (Dec 2014)

Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization

  • Houda Oufaida,
  • Omar Nouali,
  • Philippe Blache

DOI
https://doi.org/10.1016/j.jksuci.2014.06.008
Journal volume & issue
Vol. 26, no. 4
pp. 450 – 461

Abstract

Read online

Automatic text summarization aims to produce summaries for one or more texts using machine techniques. In this paper, we propose a novel statistical summarization system for Arabic texts. Our system uses a clustering algorithm and an adapted discriminant analysis method: mRMR (minimum redundancy and maximum relevance) to score terms. Through mRMR analysis, terms are ranked according to their discriminant and coverage power. Second, we propose a novel sentence extraction algorithm which selects sentences with top ranked terms and maximum diversity. Our system uses minimal language-dependant processing: sentence splitting, tokenization and root extraction. Experimental results on EASC and TAC 2011 MultiLingual datasets showed that our proposed approach is competitive to the state of the art systems.

Keywords