Jeff And Brackets

We are given two integers $$n$$ ($$1 \leq n \leq 20) and $$m$$ ($$1 \leq m \leq 10^7$$). We would like to construct a well-formed bracket sequence (of "[" and "]") of length $$n \cdot m$$. The cost of placing an open-bracket in position $$0 \leq i \leq nm-1$$ is $$A[i%n]$$, and the cost of placing a close-bracket in position $$i$$ is $$B[i%n]$$, where $$A$$ and $$B$$ are arrays of length $$n$$, with $$1 \leq A[i], B[i] \leq 10$$.

What is the minimum cost to construct a well-formed bracket sequence of length $$n \cdot m$$.
  1. Brackets
  2. DP
  3. Catalan Numbers?
  4. Math
  5. Examples / Patterns
  6. Linear Combinations
  7. Greed?
  8. Look at each $$i$$ modulo $$N$$ as a separate sequence
  9. Parity of $$N$$ might matter
  10. Data Structures
  11. Divide and Conquer
  12. Bitmask DP based on $$N$$ bits ($$2^20$$ states)
(If you are simply looking for the final solution, skip all the way down to the Review/Summary. This section here will describe all of my thought processes and struggles when solving this problem. Read this if you are interested in that process.)

As implied by my large list of initial observations, I did not have a concrete idea on how to solve this problem. I spent over a half an hour just trying to come up with observations, and I was immediately stuck. In an actual contest, I should have given up and moved onto another problem. Even after the contest, after looking at the "judges' solution", I didn't fully understand how to solve this problem. This was also a major reason why I decided to write a blog post about it. There is much to learn from this experience.

I looked at examples and tried to come up with observations, but most of my ideas led to $$O(nm)$$ DP solutions that would not work with such a large $$m$$ and $$n$$. So I mostly abandoned them, and kept trying to come up with greedy observations, so that I could avoid traversing all $$m$$ characters even once.
In theory, I should have simply just "wrote it out". Specifically, after the contest, I looked at the judges' solution, and it turned out to be based on a DP with some really cool alterations (which I will outline below). So I will discuss here the correct thought process that I should have gone through in order to solve this problem.

Firstly, ignoring the large $$m$$, we try to write down ANY DP solution, with the knowledge that we might be able to use some kind of "state-bashing" or other optimization technique to reduce the amount of work I needed to do in the DP. This yields the following dynamic programming solution (which is $$O(m^2)$$):
Let $$dp[x][b]$$ denote the minimum cost necessary to place brackets in positions $$0..x$$, so that there are $$b$$ more open brackets than close brackets in this sequence ($$b$$ is below called the "balance" or "imbalance" of the sequence). Then: $$dp[x][b] = min(dp[x-1][b-1] + A[x], dp[x-1][b+1] + B[x])$$

We should also assign $$dp[x][b] = \infty$$ if $$b\lt0$$. That is, you cannot ever have a negative balance (more close-brackets than open-brackets, or it's impossible).

Then the answer would be $$dp[m][0]$$.
This is a good formula because it is nicely "recursive". In fancy terminology, we can say that this formula exposes the problem's "optimal substructure"; the solution for a fixed $$x$$ is dependent only on the solutions for smaller $$x$$ (namely, $$x-1$$). The same thing is NOT true about the $$b$$ values, as $$dp[x][b]$$ requires a term with $$b-1$$ in it and a term with $$b+1$$ in it, so this is really the only natural way to write out the formula as a dynamic programming solution. (If we looked at it in terms of the$$b$$ values, it would seem a lot more like a graph problem or something similar.)

We now have a basic formula that we can use as a starting-point for more efficient solutions. However, this is $$O(nm^2)$$ at worst (which is unbelievably slow). To improve the efficiency, we notice that the problem is highly "symmetric" and "recursive". Formally, a well-formed bracket sequence is composed of smaller well-formed bracket sequences, which are combined (concatenated). An even stronger observation, coming directly from the problem statement, is this: Each well-formed bracket sequence can be thought of as a sequence of "blocks" of length $$n$$. If the formula can be characterized in terms of these "blocks", then this may simplify things because $$n$$ is small, and the number of different blocks of length $$n$$ is also (relatively) small.

So, maybe we can alter our dp formula based on blocks of size $$n$$. Specifically, we assume that $$x$$ is a multiple of $$n$$ (for now), and see what happens when we expand the above formula to include $$x-n$$. Then we have the following identity: \[dp[x][b] = \min_{\forall b'}\left(dp[x-n][b'] + f(b',b)\right)\] where $$f(b',b)$$ is the minimum cost it would take to go from $$b'$$ balance to $$b$$ balance with a single block of size $$n$$. (We don't really know the $$f(\cdot,\cdot)$$ function yet, or whether it is even computable, but we just place it there as a placeholder. Also, we weren't really precise about what $$\forall b'$$ means.)

The key observation is this: The blocks are "independent" of each other. Specifically, if we know that, before placing a block, we have a balance of $$b'$$, and we know that after placing a block, we want balance of $$b$$, then we can place any block that makes this happen. We also have to make sure that the $$f()$$ function only takes into account "valid" blocks (i.e.: we can't ever go into negative balance). But if that works then we now have a "nicer" dp formula based on blocks.

What makes this formula "nicer"? Well, in this formula, computing dp[x][b] only depends on dp[x-n][b'] and a function from b' to b. That is, it is mostly dependent on the "b" values, and the "x" is only used to index one dimension of the array. In the original formula, it instead depended on functions of $$x$$ ($$A[x]$$ and $$B[x]$$). As I like to say, this new formula makes the problem more "regular".

We have simplified the problem into two (hopefully easier) independent problems. First, we need to figure out how to compute the $$f(b', b)$$ function: the minimum cost to place a single block and get from balance $$b'$$ to balance $$b$$ (without ever going negative).

If we can figure out the above problem, then we have reduced the problem to the following: What is the minimum cost to place a "string" or "sequence" of blocks, starting from balance $$b' = 0$$, and ending at balance $$b = 0$$, where the string has $$m$$ blocks in it.

Both of these problems, independently, seem a bit more structured and easier to handle than the original single problem.
However, even if we can solve those two problems above, we are not done yet. As stated like this, it would still take at least $$\Omega(m)$$ work to do either of those steps. But we recall a crucial heuristic:
Many problems dealing with a string or sequence of independent "transformations" are amenable to "divide and conquer". That is, two construct a string of $$m$$ blocks, we can construct two strings of size $$\frac{m}{2}$$, and then combine them. This works as long as these strings are always independent of each other (such as in this case; we only need to know starting and ending $$b$$ values to be able to concatenate two strings). This immediately implies that we can solve this problem by divide-and-conquer (I think), therefore reducing any $$m$$ factors to $$log(m)$$.

Another similar trick is: matrices. We notice that our $$f(b', b)$$ function depends only on two values $$b'$$ and $$b$$. We could let $$i := b'$$ and $$j := b$$, and we have a matrix $$F_{ij}$$ representing the minimum cost to go from state $$i$$ to state $$j$$ using one block. Through something similar to "matrix multiplication" (except using the $$\min$$ function and $$+$$ instead of $$+$$ and $$\times$$), we can computer $$F^{m}$$ (which means: apply the matrix $$m$$ times). Using fast matrix exponentiation, we can do $$\log(m)$$ multiplications instead of $$m$$.

The above "Matrix exponentiation" technique is similar to the Floyd-Warshall Algorithm for finding shortest paths on a graph. If we let the $$b'$$ and $$b$$ represent nodes on a graph, and we place an edge from $$b'$$ to $$b$$ with weight equal to $$f(b',b)$$, then the answer to our problem is the shortest path from $$0$$ back to $$0$$ in exactly $$m$$ steps. And again, this can be made to work in $$O(\log(m))$$ time complexity rather than $$O(m)$$.

I think any of these techniques (which I've just learned from experience) can be applied here in some way or another. I'm not sure exactly how to apply them right now, but this intuitively feels like the correct way to reduce the $$m$$ factor into a $$\log(m)$$ factor. Once we know $$f(b',b)$$ the rest should be easy.
I am pretty sure, on some level, that all three above techniques (Divide-and-Conquer, Matrix Exponentiation, and Shortest-Paths). After all, "fast matrix exponentiation" is considered to be a "divide-and-conquer" approach; similarly, applying the Floyd-Warshall or other shortest path algorithm on the graph described above is the same as computing an exponentiation of its Adjacency Matrix. So I feel that all of these are equivalent techniques, just different implementations.

Maybe I will do a tutorial sometime with a proof as to when/why/how this kind of method can apply.

To begin, I feel that the key property that makes this method applicable is the fact that our operations are "associative". This means, it does not matter how we break down the problem, we still get the same answer. Notice that, in the original dp formula, the formula was NOT associative. The reason: the cost of a bracket depended on which index (modulo $$n$$) it was placed in. Once we rewrote the problem in terms of blocks of length $$n$$, however, the problem became easier to solve because there were no other constraints on the blocks. They are generally interchangeable (assuming they achieve the same change in balance).

Once we had associativity, we could then say: "well, to build $$m$$ blocks, our solution only depends only on the solutions to build $$\frac{m}{2} \pm 1$$ blocks" (rather than building $$m$$ blocks from $$m-1$$ blocks). And once we have that, we can reuse similar solutions multiple times. In the end we only compute a small number ($$O(\log(m))$$) different solutions.

In the future, I should remember that, whenever we have highly independent elements that make up a string of elements, then we can solve this by these divide-and-conquer methods. (For the reader, I believe this is called a "Group", in mathematics. An equivalent definition would be: anytime we have a group $$(S, \cdot)$$, then we can compute most counting or optimization formulae on the group in logarithmic time! Yes, I think this is correct. And I will probably do a tutorial for this one.)
However, this type of dynamic programming formula should remind the reader of another similar type of dp: "Floyd-Warshall's All-Pairs Shortest-Paths Problem". If we model this problem like a "graph", then we get that the minimum cost in $$x$$ steps to get from $$0$$ to $$b$$ is the min cost to get from $$0$$ to $$b'$$ in $$x-n$$ steps, and then the cost to get from $$b'$$ to $$b$$ in $$n$$ steps (over all possible $$b'$$). So, we are technically solving an all-pairs shortest path style problem on a graph where the nodes are the "imbalance" levels.

With this in mind, we generalize our dp as follows:

Let $$T[k][a][b]$$ denote the cost to get from a balance of $$a$$ to a balance of $$b$$ in exactly $$k$$ blocks of $$n$$ steps. Then: \[T[k][a][b] = min_{\forall b'}\{T[k-1][a][b'] + T[1][b'][b]\}\] and our above $$f()$$ function can be defined as $$f(a,b) := T[1][a][b]$$.

This suggests there may be a "divide-and-conquer" approach to solving this problem. In specific, instead of doing the DP from left to right, we see if we can divide the string/sequence into two (roughly equal) groups, and solve each one separately.

To simplify things, let's assume we divide the problem exactly at a "boundary". That is, we assume that the "left" side and the "right" side both contain some multiple of $$N$$ items (so that we don't have to worry about weird offsets and stuff... for now). Then we get something like: $$dp[x][b] = min(dp[y] One common way to improve the efficiency of $$dp$$ solutions, especially in highly symmetric cases, is to use divide-and-conquer.