**Under construction!**Website upgrade pending. Some features / pages may be missing.

# ClosestRabbit

We are given an $$N \times M$$ rectangular board, where each cell is either empty or full ($$1 \leq N,M \leq 20$$). We define the *distance* between cells $$(i_1,j_1)$$ and $$(i_2,j_2)$$ to be $$(i_1-i_2)^2 + (j_1-j_2)^2$$.

In addition, there are $$R$$ rabbits who randomly assign themselves to the empty cells so that there is at most one rabbit per empty cell. After being assigned, we consider a graph with $$R$$ nodes, each node corresponding to a different rabbit. For each rabbit $$a$$ we add an edge to the closest rabbit $$b$$. More specifically, if a rabbit is assigned to cell $$(i_1,j_1)$$, we consider the rabbit who is assigned to $$(i_2,j_2)$$ such that the distance between $$(i_1,j_1)$$ and $$(i_2,j_2)$$ is minimal (as defined above), and we add an edge between the two corresponding nodes. If there is a tie, we select the rabbit whose $$(i_2,j_2)$$ is lexicographically smallest. We do this for each rabbit (in particular, there are exactly $$R$$ edges in this graph; there may be duplicate edges, but there will be no "self-loop" edges).

Given the description of the board, and the integer $$R$$, find the expected number of connected components in this graph.

(Please see the problem statement for further clarification)

- Grid / Checker-board
- Grids $$\Rightarrow$$ bipartite graph?
- DP?
- Matching?
- Probability / Expectation / Linearity of Expectation, etc.
- Random Permutation (i.e.: order doesn't matter)
- MST / Minimum Spanning Tree?
- Forests with Cycles

**Notation:**If rabbit $$y$$ is the closest to rabbit $$x$$ (under the tie-breaking rules), and we add an edge between rabbit $$x$$ and rabbit $$y$$, then we say $$x$$ points to $$y$$.

At this point, I began looking at the structure of the cycles (based on the above characterization). Intuitively, based on the fact that the edges are to the "closest" neighbours, it seemed like there can't be very long cycles. For example, consider a triangle of rabbits (a cycle of length three) $$[a,b,c]$$. Then $$b$$ is the closest neighbour of $$a$$, and $$c$$ is the closest neighbour of $$b$$, and $$a$$ is the closest neighbour of $$c$$. Intuitively, it seems like they *all must be closest neighbours of each other*. It *has* to be an equilateral triangle.

Suppose we have a cycle of nodes $$[v_1,v_2,...,v_k,v_1]$$. Without loss of generality, assume that $$v_1$$ points to $$v_2$$ who points to $$v_3$$ and so on (where "points to" is defined above). Then $$v_2$$ is the closest neighbour to $$v_1$$. In particular, this tells us that $$|v_1v_2| \leq |v_1v_k|$$ (where $$|v_iv_j|$$ is the euclidean distance between the corresponding cells of $$v_i$$ and $$v_j$$). Similarly, $$v_3$$ is the closest neighbour to $$v_2$$, so we know that $$|v_2v_3| \leq |v_1v_2|$$.

In general, we get that $$|v_iv_{i+1}| \leq |v_{i-1}v_i|$$. Applying all of these inequalities we get: $$|v_1v_2| \leq |v_kv_1| \leq |v_{k-1}v_k| \leq ... \leq |v_{2}v_3|$$. But we have already stated that $$|v_2v_3| \leq |v_1v_2|$$, so this tells us that $$|v_1v_2| = |v_2v_3|$$ and (by symmetry) $$|v_2v_3| = |v_3v_4|$$, and so on.

Altogether, this says that all edges must be equal on this cycle.

This shows some nice structure to the graph. All the cycles are "equilateral". Based on this, intuitively, it seems like there cannot be very long cycles. For example, have you ever seen a four-sided figure (in two dimensions) where all the points are the same distance to each other? Also, even a triangle doesn't seem to work because of the "always pick the neighbour with the lowest row" rule (stated in the problem-statement). Altogether, we have the following key observation.

- There is exactly one cycle per component, so
- The number of components is equal to the number of cycles; and
- Every cycle must have exactly two nodes on it

Experience solving probability problems is (obviously) extremely helpful in solving this problem. You should likely be familiar with the terms: "Random Variable", "Sample Space", "Expected Value / Expectation", and so on. Also, one of the most powerful results from Probability theory is the "Linearity of Expectation" rule. In particular, if $$X$$ and $$Y$$ are two random variables, $$E[X+Y] = E[X] + E[Y]$$ (where $$E[\cdot]$$ denotes expected value). Many problems can be solved using these fundamental techniques.

If you have no idea about any of these terms, consult *Introduction to Algorithms* by Thomas H. Cormen, et al. Appendix C (of the Third Edition, at least) has a very nice review of Probability and Counting.

Based on this "experience", the next step in this problem is to try and compute this Expectation (i.e.: the Expected number of components in the graph) using these probability techniques

This gives us a nice characterization of the solution. To further exploit this, let's introduce some notation.

Now, the rabbits are basically symmetric. If two rabbits mutually point to each other, then it is because they are in two cells with no other rabbits in a closer cell (to either one). So we get the following:

With these two definitions, we can now write out the formula for the expectation:

\[\text{answer} = E[C] = \sum_{p \in S} \ \ \sum_{q \in S, q \neq p}{E[Y_{ab}]}\]This can actually be done by counting.

If we have two cells $$a$$ and $$b$$, consider all the cells $$c$$ that would be close enough to $$a$$ or $$b$$ to cause either $$a$$ or $$b$$ to point to $$c$$. If there is a rabbit in any of these cells $$c$$, then this will cause either $$a$$ or $$b$$ to point away from each other (to $$c$$), and they will not form the two-cycle. Otherwise, if there is no rabbit in any of these cells $$c$$, then $$a$$ and $$b$$ must point to each other, and form the two-cycle.

So, the variable $$Y_{ab}$$ will be $$1$$ if and only if *none of these cells $$c$$* are assigned a rabbit!

How do we find these cells? Well, by manually checking! We can check all other empty cells against $$a$$ and $$b$$. For each cell $$c$$, we check if it is close enough (or causes a tie to be broken in a certain way) so that $$a$$ would point to $$c$$ if $$c$$ had a rabbit in it. This will generate a set of "bad points". Then the probability of $$Pr(Y_{ab} = 1)$$ is exactly the probability that this set has no rabbits in it. If there are $$B$$ bad cells, and $$S$$ is the set of empty cells in total (including $$a$$ and $$b$$), then there are \[\binom{|S|-2-B}{R-2}\] ways of assigning $$R$$ rabbits to keep them empty (and to make sure $$a$$ and $$b$$ are occupied). And there are \[\binom{|S|}{R}\] ways in total of assigning $$R$$ rabbits randomly.

So altogether, we get the following lemma.

Altogether, this actually gives us our final algorithm (after combining this result with the corollary above): For each pair of empty cells $$a$$ and $$b$$, compute the probability that there are rabbits in $$a$$ and $$b$$ and that they point to each other (this $$E[Y_{ab}]$$ from the formula above). Sum these up to get the final answer. See the Solution Summary below for the final pseudo-code and analysis.

Most of my initial observations were useless (although they were not necessarily incorrect). The most helpful ones ended up being the later ones, which I summarize as : "We have to use probability tricks, along with exploiting the structure of the graph."

With most probability problems, there is often a "random variable" whose expectation we are trying to find. Often-times by writing it out, you can find nice patterns. Usually, this one "random variable" turns into the sum of many smaller / simpler random variables (often variables that can only be equal to $$0$$ or $$1$$). You can then find the original expected-value as the sum of expected values of these variables (the "Linearity of Expectation"). (See below for more resources on probability problems).

First, we must be able to write down the random variable (the number of components) in a nice form. We start by looking at the structure of the components; and we notice that, because the graph has $$R$$ nodes and $$R$$ edges (where $$R$$ is the number of rabbits), every component of the graph contains a single cycle with "arms" (paths) attached. (Drawing a picture helps here.)

Upon further inspection, we notice that there cannot be any "long" cycles in this graph. This is because each edge corresponds to a "closest neighbour" for some node / rabbit, and we can prove that all the nodes on a given cycle must mutually be same distance to each other. *(This was proved in Idea 1 above.)* Because of the "tie-breaking" rule, if all of these nodes are equidistant to each other then they would *all* pick the same node. Using this argument we can show that there won't be a cycle of length three or more, because this would lead to some kind of contradiction. So every cycle has length exactly two (as there are no self-edges either).

The above arguments tell us: 1) There is exactly one cycle in every component; and 2) Every cycle has length exactly two (i.e.: it is a pair of nodes, that share a duplicate-edge). Altogether, this is enough to yield the characterization we need.

Let $$C$$ denote the number of cycles (this is a random variable that depends on the final arrangement of rabbits). For a pair of rabbits in cell $$a$$ and cell $$b$$, we define $$Y_{ab} := 1$$ if and only if the two rabbits are mutually closest to each other, or we set $$Y_{ab} = 0$$ otherwise. Based on the above arguments, in any configuration, we have that $$C = \sum Y_{ab}$$ over all pairs of distinct empty cells $$a$$ and $$b$$. Since our problem is to find $$E[C]$$ (the expected value of $$C$$), we can now (*finally*) use the Linearity of Expectation to get that:
\[E[C] = \sum E[Y_{ab}]\]
over all pairs of distinct empty cells $$a$$ and $$b$$.

It is fairly easy to compute $$E[Y_{ab}] = Pr(Y_{ab} = 1)$$ by counting. Particularly, for two cells $$a$$ and $$b$$, consider a cell $$c$$ so that $$a$$ would point to $$c$$ instead of $$b$$ if both cells have a rabbit (for simplicity, we say that $$a$$ prefers $$c$$ over $$b$$). If $$c$$ has a rabbit on it, then $$Y_{ab} = 0$$ (obviously, since $$a$$ does not point to $$b$$). A similar truth holds if $$b$$ prefers $$c$$ over $$a$$. On the other hand, if we cannot find any such cell $$c$$ then $$Y_{ab}$$ must be 1, since $$a$$ and $$b$$ must point to each other. So altogether, if we let $$B := \{c : a \text{ prefers } c \text{ or } b \text{ prefers } c\}$$, then we can write $$Pr(Y_{ab} = 1)$$ as: \[E[Y_{ab}] = \frac{\binom{|S|-2-B}{R-2}}{\binom{|S|}{R}}\] where $$S$$ is the set of all empty cells, $$B$$ is the set as described now, and $$R$$ is the number of rabbits. Note that the denominator $$\binom{|S|}{R}$$ is just the total number of configurations, and the numerator is the number of configurations that result in $$a$$ and $$b$$ mutually pointing to each other.

Altogether, our algorithm is to sum up the $$E[Y_{ab}]$$ values and return this number. For a fixed $$Y_{ab}$$ we only need to find the set $$B$$ (as described above). This can be done by manually checking all other empty cells $$c$$ to see if either $$a$$ or $$b$$ prefer $$c$$. Here is the pseudo-code.

```
choose(n,k):
return n! / k! / (n-k)!;
Y(a, b):
if (a==b): return 0
bad_cells = 0
for each empty cell c:
# Check if a prefers c over b or if b prefers c over a
a_prefers_c = false
b_prefers_c = false
if distance(a,c) < distance(a,b) or (c beats b in the tie-breaker): a_prefers_c = true
if distance(b,c) < distance(b,a) or (c beats a in the tie-breaker): b_prefers_c = true
if a_prefers_c or b_prefers_c:
bad_cells ++
S := (total empty cells including a and b)
R := (total number of rabbits including those on a and b)
return (double) choose(S-2-bad_cells, R-2) / choose(S,R)
getExpected(board, R):
answer = 0.0
for each pair (a,b) of empty cells in board:
answer += Y(a,b)
return answer
```

getExpected(board,R).

The running time of the algorithm is $$\Theta((NM)^3)$$, since we essentially check every triplet of cells once in the worst case. Also, be careful about precision. I would NOT use the "choose" function as described above, and also make sure to use doubles or long doubles, because the numbers can get pretty large.

**There is exactly one cycle in each component.**This comes from the structure of the graph. Also, any graph with $$n$$ nodes and $$n$$ edges must have this property (I think).**No cycle can contain more than two nodes.**This can be found by exploiting the symmetry of cycles and the "nearest neighbour" graph.**Rabbits in cells $$a$$ and $$b$$ are mutually "closest" if and only if there is no other cell $$c$$ such that $$a$$ prefers $$c$$ or $$b$$ prefers $$c$$.****Let $$S$$ be the set of empty cells. Let $$a,b$$ be a pair of distinct empty cells. Let $$B$$ be as defined earlier. Let $$C$$ be the number of components, then $$E[C] = \sum E[Y_{ab}]$$, and \[E[Y_{ab}] = \frac{\binom{|S|-2-B}{R-2}}{\binom{|S|}{R}}\]**

- The first couple of key observations can easily be found by looking at example grids and/or example graphs.
- Alternatively, you can find out these observations by focusing on the constraints (the structure of the closest-neighbour function)
- Knowledge of Graph Theory is pretty useful here in being able to recognize the structure of the graph. Also, knowledge of Probability (such as Random Variables, Linearity of Expectation) was crucial to being able to write down the final solution.
- With probability problems, it's often helpful to wrote down the random variables to find out exactly what the "expected value" corresponds to. Then we can use other techniques to modify and simplify the formula. It's also useful to write down every little observation you come across; they can be useful later.
- We transformed our problem on components to a problem on cycles. Later, we transformed this to a problem of pairs of vertices.
- I would say the entire "Linearity of Expectation" rule is an example of simplifying a problem: instead of trying to find $$E[C]$$, we broke it down into smaller variables: $$Y_{ab}$$ for each the expected values were easy to compute. This often works in general.

- This is the second time (for me) that I have come across a problem with Probability and Graph Theory in the last little while (I will write the other problem in the "Related Problems" section below). Both times, the solution was to reduce the Expected Value of one variable to the sum of expected values over pairs of nodes (i.e.: over potential edges in the graph). Maybe this is common?
- Linearity of Expectation is extremely powerful. Whenever you can write one random variable as the sum of other random variables: do it! It often reduces to the sum of "0-1" variables (often called "Indicator Variables"), and we can exploit the fact that $$E[X] == Pr(X=1)$$ for 0-1 variables.

- Probability / Expectation / Expected Value / Linearity of Expectation / 0-1 Indicator Variables
- Combinatorics / Counting / Number of Ways / Binomial Coefficients / Counting with Probability
- Graph Theory / Graph Construction / Closest Neighbour
- Random Graphs / Random Graph Construction
- Greedy / Observations / Exploit Structure and Conditions
- Math / Formulas and Proofs