JUTI: Jurnal Ilmiah Teknologi Informasi (Jan 2010)
PENGGUNAAN TEKNIK FEATURE WEIGHTING UNTUK PEMBERSIHAN NOISE PADA HALAMAN SITUS BERITA BERBAHASA INDONESIA
Abstract
A web page usually consists of information in every page blocks displayed. In some cases, news content displayed in a news website are not entirely relevant or are unrelated to the main content such as navigation panel, copyright, user guide, links, news summary, various advertisement etc. Information blocks irrelevant to the main content is known as web pages noise. This research applies feature weighting technique to improve classification results by detecting a noise in pages of a website. Using feature weighting technique the web is first modelled with Document Object Model(DOM) tree and Compressed Structure Tree(CST) to obtain the general structure and compare the information blocks in awebsite.Information obtained is used to measure and evaluate the importance level of each node created by Compressed Structureed Tree(CST). Based on the tree created and the importance level of each node, this method assign weights on each individual word (feature) in each content block. The weights will be used in web mining process.