IEEE Access (Jan 2021)
Big Data Processing on Single Board Computer Clusters: Exploring Challenges and Possibilities
Abstract
For more than a decade, “big data” has been an industry and academia buzz phrase. Over this time, many companies adopted Apache Hadoop and Spark frameworks for their massive data storage and analysis efforts, using powerful, energy-hungry, general-purpose server as their big data processing platforms. But not all industry or academic fields want, or even need, such large systems. Moreover, capital costs aside, power consumption has also become a primary data center concern. Consequently, lower-cost, lower-power microservers have emerged as viable alternatives in many settings. Now, the latest generation Raspberry Pi (RPi), model 4B, exhibits significant computational performance improvements over its predecessors, and is presently considered a sufficiently powerful single board computer (SBC) to run many mainstream operating systems and accommodate heavy workloads. This paper reexamines SBC cluster big data processing possibilities by integrating the most powerful (presently) RPi model–the RPi 4B with 4 Gigabytes (GB) main memory. We examine external storage’s performance impact on such an SBC cluster’s big data processing performance by employing three different external storage solutions with measurably distinct performance characteristics. Moreover, we discuss challenges we encountered and identify further SBC cluster performance optimizations. We perform several representative big data application benchmarks and measure various key performance metrics such as execution time, power consumption, throughput, performance-per-dollars, etc. Our extensive experiments and comprehensive studies conclude this current, fourth-generation RPi has evolved to become the first generation to effectively run massive (i.e., more than 100GB) workloads in big data processing applications.
Keywords