Discrete Analysis (Mar 2018)
Balancing sums of random vectors
Abstract
Balancing sums of random vectors, Discrete Analysis 2018:4, 16 pp. Imagine that you are working at a small firm, and it is your job to assign incoming tasks, which are sporadic and unpredictable, in a fair way to the firm's employees. Fairness can be measured in several ways: different tasks take different amounts of time, require different amounts of effort, are not equally safe, and offer different rewards. Your aim is to be as fair as possible in all respects. One can model this problem as follows. Each task is a random vector, which represents its various attributes. The random vectors are organized into a sequence, and your task is to partition the vectors into cells, one for each employee. You must choose the cell for each vector as it arrives (and in particular before seeing later vectors), and you wish to do so in such a way that the sums of the vectors in the different cells remain as close to each other as possible. This model applies to many other task-scheduling problems: an important one is directing incoming computational tasks when one has a number of servers. This abstract question is (when suitably formulated) the focus of this paper. The authors contrast it with various other problems that have been looked at. There are three features of the problem that can be changed independently. One possible change is to make the problem deterministic: that is, the sequence of vectors is no longer random, and one is looking for a worst-case result rather than an average-case result. Another is to make the strategy "prescient", meaning that one gets to see the entire sequence before deciding how to partition the vectors. A third is to restrict to the one-dimensional case. Some of the resulting variants have long and interesting histories. The authors show that each of these features makes a significant difference. In particular, if the random vectors are unit vectors, then there is an easy strategy in the one-dimensional case to keep the sums roughly equal, which is to add any negative vector to the cell with the largest sum and any positive vector to the cell with the smallest sum. However, as soon as the dimension is higher than 1, the level of imbalance must tend to infinity: one of the results of the paper is, roughly speaking, that it must grow at a rate of at least $(\log t/\log\log t)^{1/2}$. (The authors measure the imbalance using the Euclidean norm. That is, they try to minimize the maximum up to time $t$ of the maximum Euclidean difference between any two cells.) This is then matched by upper bounds that show that the lower bound is sharp or close to sharp under various circumstances. One of the strategies the authors consider is a direct generalization of the simple one-dimensional strategy just mentioned. When a vector $w$ comes in, one looks at the totals $v_1,\dots,v_k$ and assigns $w$ to the cell $i$ for which $\langle v_i,w\rangle$ is minimal. This strategy performs very well. Indeed, for probability distributions that are sufficiently well-behaved, it gives an upper bound that matches the lower bound to within a constant, and for all distributions it achieves a growth rate of around $(\log t)^{1/2}$. Somewhat suprisingly, there are distributions that show that this bound is almost tight as well: for every increasing function $\omega$ that tends to infinity, one can find a distribution that gives a growth rate of $(\log t)^{1/2}/\omega(t)$. The paper ends with a number of interesting open problems.