Ekstraksi Data pada Tabel dari Halaman Web Menggunakan Pohon Document Object Model

Memen Akbar; Cici Patmala; Dini Nurmalasari

doi:10.22146/jnteti.v5i4.273

Jurnal Nasional Teknik Elektro dan Teknologi Informasi (Nov 2016)

Ekstraksi Data pada Tabel dari Halaman Web Menggunakan Pohon Document Object Model

Memen Akbar,
Cici Patmala,
Dini Nurmalasari

Affiliations

Memen Akbar: Politeknik Caltex Riau
Cici Patmala: Politeknik Caltex Riau
Dini Nurmalasari: Politeknik Caltex Riau

DOI: https://doi.org/10.22146/jnteti.v5i4.273
Journal volume & issue: Vol. 5, no. 4
pp. 265 – 271

Abstract

Read online

Data on the web page can be available in various formats, such as table. With the growing of web pages, the need to extract data from tables is increasing. Results of the extraction can be used for integration with other web tables or stored in a database. This study discusses the extraction of data from a table on a web page using a Document Object Model (DOM) tree. The initial step of this extraction process is to transform the HTML document into a DOM tree. Then, by applying search methods Depth First Search (DFS), part of the data in the table is extracted and stored in a CSV file. An engine has been developed using Visual Basic. The results show that the engine can automatically extract data from the table that has the following characteristics: the number of rows and columns are not limited, able to handle all of the table orientation layout, and able to handle tables that are merged cells.

Published in Jurnal Nasional Teknik Elektro dan Teknologi Informasi

ISSN: 2301-4156 (Print); 2460-5719 (Online)
Publisher: Universitas Gadjah Mada
Country of publisher: Indonesia
LCC subjects: Technology: Engineering (General). Civil engineering (General)
Website: https://jurnal.ugm.ac.id/v3/JNTETI

About the journal

Abstract

Keywords