Big Data Infrastructure Design Optimizes Using Hadoop Technologies Based on Application Performance Analysis

Shafiyah Shafiyah; Ahmad Syauqi Ahsan; Rengga Asmara

doi:10.32520/stmsi.v11i1.1510

Sistemasi: Jurnal Sistem Informasi (Jan 2022)

Big Data Infrastructure Design Optimizes Using Hadoop Technologies Based on Application Performance Analysis

Shafiyah Shafiyah,
Ahmad Syauqi Ahsan,
Rengga Asmara

Affiliations

Shafiyah Shafiyah: Politeknik Elektronika Negeri Surabaya
Ahmad Syauqi Ahsan: Politeknik Elektronika Negeri Surabaya
Rengga Asmara: Politeknik Elektronika Negeri Surabaya

DOI: https://doi.org/10.32520/stmsi.v11i1.1510
Journal volume & issue: Vol. 11, no. 1
pp. 55 – 72

Abstract

Read online

Big data's infrastructure is a technology that provides the ability to store, process, analyze, and visualize large data. The tools and applications used are one of the challenges when building big data's infrastructure. In the study, we offered a new strategy to optimize big data infrastructure design that was an essential part of big data processing by performing performance analysis applications used at each stage of big data processing. The process started from collecting data sourcing online news using web crawler methods using Scrapyand Apache Nutch. Next, implement Hadoop technologies to facilitate the distribution of big data storage and computing. No-sql databases Mongo DB and HBase made it easier to query data, after which they built search engines using Elasticsearch and Apache Solr. Data saved later in analysis using hive apache, pig, and spark. The data has been analyzed was shown on the website using Zeppelins, Metabolase, Kibana, and Tableau. The test scenario consisted of the number of servers and files used. Testing parameters started from process speed, memory usage, CPU usage, throughput, etc. The performance testing results of each application were compared to and analyzed to see the merits and defaults of the application as a reference to building optimal infrastructure design to meet the needs of the user. This research has produced two big data infrastructure design alternatives. The suggested infrastructure has been implemented on computer nodes in the big data pens for processing big data from online media and proving to be running well.

Published in Sistemasi: Jurnal Sistem Informasi

ISSN: 2302-8149 (Print); 2540-9719 (Online)
Publisher: Islamic University of Indragiri
Country of publisher: Indonesia
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://sistemasi.ftik.unisi.ac.id/index.php/stmsi

About the journal