Vietnam Journal of Computer Science (Nov 2022)

An Experimental Study of Convolutional Neural Networks for Functional and Subject Classification of Web Pages

  • Codruţ-Georgian Artene,
  • Dumitru-Daniel Vecliuc,
  • Marius Nicolae Tibeică,
  • Florin Leon

DOI
https://doi.org/10.1142/S2196888822500245
Journal volume & issue
Vol. 09, no. 04
pp. 435 – 453

Abstract

Read online

Information filtering and information retrieving applications are based on web page classification methods. Usually, web pages serve different functionalities or develop different topics or subjects. The diversity of web page content increases the need for automatic web page classification, making it a challenging task at the same time. Considering that the main component of the content of a web page is most often represented by the text and the classification of the text is a problem intensively studied in the last years, with researchers reporting state-of-the-art results for various methods, the idea of applying these methods on the text extracted from web pages could lead to important results. In this work, we revisit our experimental study on convolutional neural networks for multi-label multi-language web page classification with a new approach that consists of dividing the classification problem into functional classification and subject classification of web pages. From the experimental evaluation, one may conclude that the separation of the functional and subject classification of web pages leads to an improvement of the overall results.

Keywords