IEEE Access (Jan 2019)
Mining Frequent Items Over the Distributed Hierarchical Continuous Weighted Data Streams in Internet of Things
Abstract
Recently, with the increasing supply of band width and the diversification of applications in the Internet of Things (IoT), it has been a challenging problem to identify frequent items (also called heavy hitters) in high-speed and dynamically changing data streams. As well as, these data streams are from multiple sources in a distributed environment. To solve it, we propose the distributed-tracking schemes for continuously mining frequent items in the multi-level, non-regular tree-based communication structure. Our method employs a combination of local tracking and delays updating at each node to produce highly communication-efficient and space-efficient solutions. To reduce the communication cost, it only sends the frequency increments to violate a pre-defined threshold through a hierarchy of intermediate nodes, which is interposed between the monitoring nodes and the root node. With the information gathered, the root node continuously reports the set of frequent items. Two optimization approaches are proposed to minimize the worst-case total communication and minimize the worst-case maximum load on any link under any input streams. We perform extensive simulations with real traffic traces to evaluate the performances of the two optimization approaches.
Keywords