Икономика и компютърни науки (Jun 2021)
Integrating Distributed Hadoop System into the Existing Infrastructure
Abstract
A distributed Hadoop system can integrate clusters of different organizations. The purpose of this article is to consider the options for building an architecture of a distributed Hadoop system, so that it is, on the one hand, to integrate a logically complete Hadoop system, and on the other hand – to define the conceptual framework of individual components that technically implement this. The scope of the research covers the problem of integrating remote Hadoop cluster to other one. A several findings are made and comparison between different ways of organization in data processing – multiple clusters and multi-lease are made. Both approaches have their advantages and disadvantages which could be used in practice.