Understanding and Statically Detecting Synchronization Performance Bugs in Distributed Cloud Systems

Chen Zhang; Jiaxin Li; Dongsheng Li; Xicheng Lu

doi:10.1109/access.2019.2923956

IEEE Access (Jan 2019)

Understanding and Statically Detecting Synchronization Performance Bugs in Distributed Cloud Systems

Chen Zhang,
Jiaxin Li,
Dongsheng Li,
Xicheng Lu

Affiliations

Chen Zhang: ORCiD; Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, China
Jiaxin Li: Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, China
Dongsheng Li: Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, China
Xicheng Lu: Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, China

DOI: https://doi.org/10.1109/access.2019.2923956
Journal volume & issue: Vol. 7
pp. 99123 – 99135

Abstract

Read online

In such an information society, the Internet of Things (IoT) plays an increasingly important role in our daily lives. With such a huge number of deployed IoT devices, Cyber-Physical System (CPS) calls for powerful distributed infrastructures to supply big data computing, intelligence, and storage services. With the increasingly complex distributed software infrastructures, new intricate bugs continue to manifest, causing huge economic loss. Synchronization performance problems, which means that improper synchronizations may degrade the performance and even lead to service exception, heavily influence the entire distributed cluster, imperiling the reliability of the system. As one kind of performance problems, the synchronization performance problems are acknowledged as difficult to diagnosis and fix. We collect 26 performance issues in three real-world distributed systems: HDFS, Hadoop MapReduce, and HBase, and do analysis on their root cause, fix strategy, and algorithm complexity in order to understand these synchronization performance bugs better. Then, we implement a static detection tool including critical section identifier, loop identifier, inner loop identifier, expensive loop identifier, and pruning component. After that, we evaluate our detection tool on these three distributed systems with sampled bugs. In the evaluation, our detection tool accurately finds out all the target bugs. Besides, it points out more new potential performance problems than the previous works. With the strict performance overhead, our detection tool is proved to be greatly efficient.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords