Journal of ICT Research and Applications (Aug 2016)
High Performance CDR Processing with MapReduce
Abstract
A call detail record (CDR) is a data record produced by telecommunication equipment consisting of call detail transaction logs. It contains valuable information for many purposes in several domains, such as billing, fraud detection and analytical purposes. However, in the real world these needs face a big data challenge. Billions of CDRs are generated every day and the processing systems are expected to deliver results in a timely manner. The capacity of our current production system is not enough to meet these needs. Therefore a better performing system based on MapReduce and running on Hadoop cluster was designed and implemented. This paper presents an analysis of the previous system and the design and implementation of the new system, called MS2. In this paper also empirical evidence is provided to demonstrate the efficiency and linearity of MS2. Tests have shown that MS2 reduces overhead by 44% and speeds up performance nearly twice compared to the previous system. From benchmarking with several related technologies in large-scale data processing, MS2 was also shown to perform better in the case of CDR batch processing. When it runs on a cluster consisting of eight CPU cores and two conventional disks, MS2 is able to process 67,000 CDRs/second.