Ksenia And Combinatorics

We are given two integers $$n$$ ($$1 \leq n \leq 50$$) and $$k$$ ($$1 \leq k \leq 50$$). We would like to construct a binary tree on $$n$$ labelled nodes (numbered 1 to $$n$$) such that the size of the maximum matching on this tree is exactly $$k$$ and the root is node $$1$$. How many different possible trees are there that satisfy this condition? (Return the answer modulo 1000000007 (10^9 + 7))

Note: Two trees are considered different if there exist a pair of nodes $$a$$ and $$b$$ such that the edge $$ab$$ exists in one tree but not in the other.
  1. Combinatorics
  2. DP
  3. Catalan Numbers
  4. Tree DP
  5. Multiply everything by $$n$$ factorial or $$(n-1)$$ factorial
  6. Bipartite Matching
  7. Max Flows
  8. Hall's Marriage Theorem
Since this is a kind of counting problem on a tree, one's initial guess would be that this problem requires Dynamic Programming: in particular, "Tree DP" as we like to call it (which just means "Dynamic Programming on a Rooted Tree").

With that in mind, and since there are only two parameters ($$n$$ and $$k$$), we intuitively can guess that we will have some 2-Dimensional DP array of the form dp[n][k] meaning "how many different rooted binary trees are there with $$n$$ nodes having a maximum matching of size $$k$$". With this in mind, we realize that when constructing a tree with $$n$$ nodes and max matching of size $$k$$ , then we can construct it recursively: we fix a root, then construct the left child with some $$n_1$$ number of nodes and a max matching of size $$k_1$$, and construct a right child with the remaining nodes ($$n - 1 - n_1$$) that has a max matching of size $$k - k_1$$ or $$k - k_1 - 1$$, depending on if the root was matched.

This is the most intuitive way to formulate the problem; at least, it is intuitive for people who have seen the technique of Tree DP often enough. If this was not intuitive to you (the reader), I would recommend practicing on Tree DP problems. In any case, this solution is fairly close to being correct, but we run into a couple problems. Specifically, we run into an issue with the last part: "depending on if the root was matched". Handling these off-by-one issues means we have to be precise with our formulation. Also, there are issues of symmetry: we might double count the same sub-tree various times during our algorithm.

Ignoring the issues of symmetry, we need to first worry about the precise formulation, and handling the case of when the root is matched and not. After some work with a pen and paper, I realized that there are two types of trees:
  • Those trees where the root MUST be matched in order for it to be a maximum matching
  • Those trees where the root does not need to be matched for it to be a maximum matching
And I also noticed a lemma: "There always exists a maximum matching with the root matched". (This is fairly easy to prove; Take any optimal solution where the root is not matched, un-match one of its children, and then match the root with that child. This will never decrease the size of the matching.)

Based on those two types, I extended my Dynamic Programming function / array to be of the form: dp[n][k][g] := "the number of binary trees on $$n$$ nodes where the size of the maximum matching is $$k$$, and the size of the maximum matching is $$k-g$$ when the root cannot be used ($$g \in \{0,1\}$$)". That is, if $$g$$ is 1, we are counting those trees for which the root node is "crucial" to the matching (i.e.: without it, the size of the maximum matching decreases by 1); and if $$g$$ is 0, we are counting those trees for which the root node is not crucial (i.e.: removing the root node does not change the size of the maximum matching).

With this, it turns out that we can define $$dp[n][k][g]$$ recursively in terms of smaller $$n,k,$$ and $$g$$. It takes careful case analysis, but it is not too hard to come up with the correct formulas.

Before writing down these formulas, we need to handle one last thing: symmetry. That is, we don't want to double-count any trees that are symmetric in various ways. An example of why symmetry is important is that the following the trees are the same so should only be counted once:
    1               1
   / \     and     / \
  2   3           3   2
 /                   /
4                   4
They are the same according to the problem statement (see the actual problem statement on Codeforces for more clarity). They have the same edge set. And based on the above discussions, these might be counted twice, which is wrong.

To avoid double-counting we do two things when given a choice as to how to construct the sub-trees of our tree: 1) we always assign a greater or equal number of children to the left sub-tree than the right sub-tree; and 2) if there are an equal number of children in both sub-trees, then we assign the node with the lowest number/label to the left sub-tree. This means that every possible binary tree (along with all possible labellings) will be uniquely represented and therefore counted exactly once in whatever dp formulation we do.

OK. We now have all the parts needed to construct the dp formula. Recall we want to recursively construct a function/array: dp[n][k][g] := "the number of binary trees on $$n$$ nodes where the size of the maximum matching is $$k$$, and the size of the maximum matching is $$k-g$$ ($$g \in \{0,1\}$$) when the root is not used in the matching" (also, we assume that the label on the root node is fixed/decided ).

(The reader is encouraged to formulate the remainder on his/her own. This is because the formulas are easier to derive yourself, but my explanations that follow might just confuse you. So it might be more beneficial to just derive them on your own.)

The base cases are a little weird, so I handle the recursive step first (for $$n \geq 2$$).

First suppose $$g==0$$. To construct $$dp[n][k][0]$$ we loop through all possibilities of $$n_1$$ and $$k_1$$ (which correspond to the number of nodes and the size of the maximum matching in the left sub-tree). For each pair $$(n_1, k_1)$$, we add to the count: $$dp[n_1][k_1][1] \cdot dp[n - n_1 - 1][k - k_1][1]$$. The $$1$$ in the last dimension means "in these subtrees, their respective roots MUST be matched". This is because our original $$g$$ is 0, so we want subtrees that guarantee that the size of the matching is the same regardless of whether or not the root is matched. And we can prove that this is exactly the condition needed to make that happen (the proof is more-or-less "by inspection").

Now suppose $$g==1$$. To construct $$dp[n][k][1]$$ we again loop through all possibilities for $$n_1,k_1$$. However, in this case, the root MUST be matched in order to achieve a max matching. So we only want sub-trees that force the root to be matched. Now, suppose the root must be matched in an optimal solution: either it a) matches to the left sub-child only, b) matches to the right sub-child only, or c) can be matched to either left or right child without decreasing the size of the matching.

For each of these cases we add:
a) $$dp[n_1][k_1][0] \cdot dp[n_2][k_2][1]$$
b) $$dp[n_1][k_1][1] \cdot dp[n_2][k_2][0]$$
c) $$dp[n_1][k_1][0] \cdot dp[n_2][k_2][0]$$

Where $$n_2 := n - n_1 - 1$$ and $$k_2 := k - k_1 - 1$$ (for clarity). To understand what the "0" and "1" means in all these cases, go back to where I defined what dp[n][k][g] means. In particular, these "0"'s and "1"'s basically explain what happens when you try and swap the root from being matched from the left child to the right child and vice-versa. If there is a "0" in the third dimension, then the size of the matching does not decrease/increase when things are swapped, and if there is a "1" then it does increase/decrease when things are swapped. And we carefully consider all the cases / combinations to come to the correct formulas.

Lastly, setup the base cases so that everything makes sense. In particular: dp[1][0][0] = 1
dp[1][0][1] = 0


dp[0][0][0] = 0
dp[0][0][1] = 1
The "n==0" cases are not really well-defined, so I just set them up so that the numbers add up correctly.
You also have to handle some factors in there. For example, there is an (n-1 Choose n1) to handle relabellings.

Anyway, the final algorithm (pseudocode) is:
Given N,K.

Set dp[n][k][g] = 0 for all n,k,g
dp[1][0][0] = 1
dp[1][0][1] = 0
dp[0][0][0] = 0
dp[0][0][1] = 1

var ways_to_relabel = 0    

for all n = 2..N:
    for all k = 1..N/2:
        for all g = 0..1:
            for all n1 = 0..n-1:
                n2 = n-1-n1
                
                # Handle symmetry. Ensure n1 >= n2.
                # Also, if n1 == n2, ensure that "smallest" element is in left side.
                if (n2 > n1) continue;        # handle symmetry
                if (n1 != n2):    # choose the n1 labels to go to left side
                    ways_to_relabel = (n-1 choose n1)
                else:            # choose the n1 labels; but one of them is fixed.
                    ways_to_relabel = (n-2 choose n1-1)
                ways_to_relabel *= n1 * n2    # pick the roots of the left and right subtree
                
                for all k1 = 0..k-g        # k-1 if g==1, since 1
                    k2 = k-g-k1
                    if (g==0):
                        dp[n][k][g] += (dp[n1][k1][1]*dp[n2][k2][1])*ways_to_relabel
                    else:
                        dp[n][k][g] +=     (dp[n1][k1][0]*dp[n2][k2][1]
                                        + dp[n1][k1][1]*dp[n2][k2][0]
                                        + dp[n1][k1][0]*dp[n2][k2][0]) * ways_to_relabel

print dp[N][K][0] + dp[N][K][1]
And this algorithm is $$O((NK)^2)$$ time complexity, which is fine.
  1. In a rooted tree, there is always a maximum matching where the root is matched.
  2. Given a dp formulation: dp[n][k][g] (as described above), the table can be completed and filled inductively/recursively. The last dimension ($$g \in \{0,1\}$$) was needed to actually make it work. By fixing whether or not the root was necessary for the matching, we were able to correctly break down the problem into an equivalent problem on the left and right sub-trees, and combine the answer (i.e.: it became solvable by Dynamic Programming).
  3. The final formula / algorithm. The actual formula for the dp required careful case analysis and handling of off-by-one errors and symmetry. This took the most time to derive, and it also cost me some time penalties due to wrong submissions in the contest.
  1. The fact that this problem was solvable by DP is recognizable from the fact that it is a "counting" problem (most combinatorics problems are solvable by DP). Furthermore, since we have a problem on trees, its likely that this is a Tree DP problem, where the "state" of the dp involves the current sub-tree. Knowing this would save a lot of time at the beginning of the problem and point you in the right direction right away.
  2. Precisely looking at the cases and bashing them out led to the correct solution. It was important to write out all the formulae. This took the most time, but it was also necessary to solving this problem without errors.
  • I need to be faster. Recognizing the correct direction was intuitive (knowing that it was DP). However, it took me over an hour and a half to solve this problem. I should practice doing this difficulty of problems and increasing the speed at which I finalize and submit the problem.
  • Dynamic Programming / DP / Tree DP
  • Combinatorics / Counting DP
  • Matching on a Tree