Frontiers in Computational Neuroscience (Nov 2011)
Comparison of neuronal spike exchange methods on a Blue Gene/P supercomputer
Abstract
The performance of several spike exchange methods using a Blue Gene/P supercomputerhas been tested with 8K to 128K cores using randomly connected networks of up to 32M cells with 1k connections per cell and 4M cells with 10k connections per cell. The spike exchange methods used are the standard Message Passing Interface collective, MPI_Allgather, and several variants of the non-blocking multisend method either implemented via non-blocking MPI_Isend, or exploiting the possibility of very low overhead direct memory access communication available on the Blue Gene/P. In all cases the worst performing method was that using MPI_Isend due to the high overhead of initiating a spike communication. The two best performing methods --- the persistent multisend method using the Record-Replay feature of the Deep Computing Messaging Framework DCMF_Multicast;and a two phase multisend in which a DCMF_Multicast is used to first send to a subset of phase 1 destination cores which then pass it on to their subset of phase 2 destination cores --- had similar performance with very low overhead for the initiation of spike communication. Departure from ideal scaling for the multisend methods is almost completely due to load imbalance caused by the largevariation in number of cells that fire on each processor in the interval between synchronization. Spike exchange time itself is negligible since transmission overlaps with computation and is handled by a direct memory access controller. We conclude that ideal performance scaling will be ultimately limited by imbalance between incoming processor spikes between synchronization intervals. Thus, counterintuitively, maximization of load balance requires that the distribution of cells on processors should not reflect neural net architecture but be randomly distributed so that sets of cells which are burst firing together should be on different processors with their targets on as large a set of processors as possible.
Keywords