Doklady Belorusskogo gosudarstvennogo universiteta informatiki i radioèlektroniki (Jun 2019)
ALGORITHM FOR MINING OF CORE WEBSITES PARTS FOR INFORMATIONAL SEARCH EFFICIENCY
Abstract
Algorithm for automatic dividing of web page into 2 parts: service-navigational and contend parts is described. The method is based on the mining of repeatable elements in html-pages from same website. Main theory is that the quality of information search can be improved by tagging / deleting navigational elements of html pages. Developed method successfully mine service and content parts from html-pages. On the other hand, deleting of service part does not guarantee perfect improvement of web information search quality.