We are given an array $$A$$ of $$N$$ integers $$(1 \leq N \leq 1,000,000)$$. For each $$i$$, we have that $$0 \leq A[i] \leq 4$$. A "swap" consists of decreasing $$A[i]$$ and increasing $$A[j]$$ by 1 for some $$i,j$$ (as long as $$A[i] \geq 1$$ and $$A[j] \leq 3$$). That is, we can perform swaps as long as the numbers stay between 0 and 4 after each swap. The goal is to perform a series of swaps until all array elements are 0, 3, or 4 (i.e.: until there are no occurrences of 1 or 2). What is the minimum number of swaps necessary to achieve this goal?
  1. Graph problem
  2. Sorting?
  3. DP - Dynamic programming?
  4. Merge sort
  5. Counting
  6. Greed
  7. Flow
  8. Binary Search
Notice that the sum of all the integers does not change. Every unit we swap from one element will go into another element (similar to a "flow" constraint). Specifically, since we must end up with only 0's, 3's, and 4's, the sum of the elements in the end (and the sum of the elements in the beginning) must be written as $$3x + 4y$$ for some $$x,y \geq 0$$, or else it is impossible.
This type of problem sounds familiar: find all solutions to $$3x + 4y = S$$. If we didn't need to have $$x,y \geq 0$$, then we could find any and all possibilities using the Extended Euclidean Algorithm for finding Greatest Common Denominators; however, I recall that, with the non-negativity constraint, this can be solved much differently. Specifically, to solve problems of this form, we can model this problem using a "graph theory" approach. Since we want $$3x + 4y = S$$, that is the same as $$4y = S - 3x$$, or that $$S - 3x = 0 (mod 4)$$. So we have nodes for each remainder modulo 4. There is an edge from some remainder $$r$$ to $$r-3$$ with length $$1$$. And our goal is to find the shortest path from $$S (mod 4)$$ to $$0 (mod 4)$$. The length of this shortest path can be set as $$x$$, because then $$S - 3x = 0 (mod 4)$$ and we will be done. Alternatively, we can simply perform this procedure directly (without the graph), by starting at $$S$$ and subtracting multiples of three until it is congruent to 0, modulo 4.
The above "experiences" explain my Initial Observations. I immediately recalled this kind of problem, hence why I thought about things such as "graphs" and "flows".
The problem is solvable if and only if $$S = 3x + 4y$$ for some $$x,y \geq 0$$.
(Based on Key Observation. The converse direction is almost immediate as well.)
Can I solve this problem using a greedy algorithm?
Every time I send a "4" to a "2", we gain two "3's". Every time I send a "1" to a "3", we gain a "0" and a "4". Every time I send a "1" to a "2", we gain a "0" and a "3". Maybe we should characterize the set of "moves" that can be done, and greedily choose the "best" one?
Take an index $$i$$. If, in an optimal solution, there is at least one unit sent (swapped) into $$A[i]$$ (i.e.: $$A[i]$$ increases at least once during the algorithm), then $$A[i]$$ will never send units to any other index.
Suppose $$A[k]$$ sends $$p$$ units to $$A[i]$$ over the course of the algorithm, and that $$A[i]$$ sends $$q$$ units to $$A[j]$$ throughout the algorithm. The total decrease to $$A[k]$$ here was $$p$$, the total increase to $$A[i]$$ was $$p-q$$ and the total increase to $$A[j]$$ was $$q$$. If $$p>q$$ we can send $$p-q$$ units directly from $$A[k]$$ to $$A[i]$$, and $$q$$ units directly from $$A[k]$$ to $$A[j]$$. This will require less swaps in total, but will have the same resulting increase/decrease for each element. A similar strategy holds if $$p < q$$.
At this point, I could not fully characterize these moves. I couldn't wrap my head around which types of moves would be good or bad. I could not decide on a greedy strategy. The only thing the above lemma tells me is that, "some elements should be 'givers' and some should be 'takers'", but no element can be both a giver and a taker. After attempting to think about this further, I eventually abandoned this approach.
As noted above, there is a "conservation" property: each unit that leaves one cell must go into another cell. This reminds me of "flow" constraints and the Maximum Flow problem (I'm also always on the look-out for flow problems). Is there a way to model this problem as a Maximum Flow, Minimum Cut, or Min-Cost Max-Flow?
At this point I tried a few different flow constructions. For example, I constructed a graph with 5 nodes (one for 0,1,2,3, and 4), along with super-source with $$S$$ and super-sink $$T$$. There was flow from $$S$$ to a node $$v$$ equal to the number of cells $$i$$ so that $$A[i] = v$$. Flow from an inner node to another inner node corresponded to a swap. I also tried assigning "costs" for more costly swaps, somehow. I actually spent much of the remainder of the contest (Codeforces Round #207, where I first encountered this problem) attempting to solve the problem using flows. In the end, I came to a solution with minutes to spare, so I didn't end up solving it in the contest.
I drew out a simple example: $$ A = \left[2,2,2,4,2\right] $$. I didn't really expect this to help, but I was desperate for ideas, and "seeing examples" is good for the problem-solving process in general, especially when one is stuck. Actually, this example did lead me to a key realization! Here is how: First, I attempted to solve it by hand and came up with the solution: "send all units from 4 to each of the other 2's". This would yield a resulting array $$ A = \left[3,3,3,0,3\right]$$ in 4 swaps. Another solution would be to send two units from two of the 2's to the other two 2's (tongue-twister) in 4 swaps, yielding: $$A = \left[0,4,4,4,0\right]$$. We either end up with 4 threes or 3 fours. This can be explained by the earlier characterization lemma. The sum will stay the same, and we will have $$x$$ threes and $$y$$ fours in the end. In this case, the sum of the elements is 12. So we will either have $$(3,4)$$ or $$(4,3)$$ as $$(x,y)$$.
How many different possible choices are there for $$x$$ and $$y$$, that satisfy $$3x + 4y = S$$, where $$S$$ is the sum of all the elements?
By inspection, the answer (to the above question) is at most $$\frac{S}{4} + 1$$, since $$0 \leq y \leq \frac{S}{4}$$ in particular (and $$x$$ is uniquely determined by $$y$$ once decided). So, there are $$O(S)$$ solutions. By the constraints in the problem statement, this implies that there are no more than $$1,000,000$$ choices.
Okay, what if we were to pick a particular $$x$$ and $$y$$? That is, if we knew that we were going to have $$x$$ threes and $$y$$ fours in the end, is it possible to solve this problem quickly? Since there are only $$O(S)$$ choices, this may yield a nice solution.
Notice, if we know that we are going to have $$x$$ threes and $$y$$ fours in the end, then the remaining cells must be zero. So if $$N$$ is the number of elements in the array, then we will have $$z := N - x - y $$ elements with a zero in them. This composes all the elements. So, in the beginning, we have a certain number of zeroes, a certain number of ones, twos, threes, and fours. In the end, we will have $$x$$ threes, $$y$$ fours, $$z$$ zeroes, and zero ones or twos. This can be solved by a flow, I think. Consider the algorithm that follows.
For $$0 \leq v \leq 4$$ let $$amt[v]$$ be the number of times the value $$v$$ occurs in the array $$A$$ at the beginning. Construct a "bipartite" graph with five nodes labelled 0,1,2,3,4 on the left, and nodes labelled 0,3, and 4 on the right. Add a super-source $$s$$ with edges/arcs to the left-nodes and a super-sink $$t$$ with edges from the right nodes. An edge from $$s$$ to a node $$v$$ on the left will have capacity $$amt[v]$$. An edge from a node on the right to the sink $$t$$ will have capacity corresponding to $$z, x,$$ or $$y$$ (for node 0, node 3, or node 4, respectively). Altogether, the left side nodes represent the "beginning", and the right-side nodes represent the "end". Every cell will start with some number in it, and will end with some number in it. A unit of flow from $$v$$ on the left to $$w$$ on the right will indicate that 1 cell that originally had $$v$$ (i.e.: $$A[i] = v$$) will become $$w$$ in the end, by some sequence of swaps. For this purpose, we add an edge from each $$v$$ on the left to each $$w$$ on the right (the capacity doesn't matter; it can be infinite for all intents and purposes). Notice, every flow in this network will describe some sequence of valid swaps. In particular, in a maximum flow, if all $$sv$$-edges are filled to capacity, must result in all the left-side nodes sending flow to the right-side nodes; in particular, all of the "1" and "2" units will have been transformed into 0,3, or 4. Lastly, we setup costs on the edges so that a "minimum cost" maximum-flow will yield the minimum cost solution. The $$sv$$-edges and the $$wt$$-edges (i.e.: the outer edges touching the source or the sink) can have cost 0 (i.e.: they do not affect anything). The inner edges will have cost associated with the number of swaps necessary. One note, however: If we assign cost for every change, then we will be "double counting" some swaps. For example, if some 2 cell becomes a 0 cell, it must do so by performing 2 swaps (so we will assign it a cost of 2). But for each of these two swaps, they must have increased the value of some other cells (for example, we might have sent these two units to two other 3's, transforming them to 4's). So we will only assign cost to those transitions that decrease a cell; that way, we will avoid double counting. Hence, from a node $$v$$ on the left side, to a node $$w$$ on the right side, the arc $$vw$$ will have cost as follows:
$$\text{cost}(vw) = \left\{\begin{array}{rl}
0 & : v < w \\
v-w & : v \geq w
\end{array}\right. $$
So the cost of the edge from $$v$$ to $$w$$ is exactly the amount of swaps it would take to turn a cell with $$A[i] = v$$ into a cell with $$A[i] = w$$. Notice we do not specify "which" cells it swaps with, only the amount of swaps it does. To the other cells (the ones that increase in value), we assign a cost of 0, so that the total cost is amortized and counted correctly. A min-cost max-flow from $$s$$ to $$t$$ in this graph defines the minimum number of swaps needed if we have fixed $$x$$ and $$y$$.
The above algorithm explains how to solve using max-flows if we have a fixed $$x$$ and $$y$$. Here is the complete algorithm, given $$A$$ and $$N$$.
  int S = sum{A[i]} for 0 <= i <= N;
  int x = 0;
  while(S>=0 && S%4 != 0):
    S -= 3; x++;
  if (S%4 != 0):
  int y = S/4;

  // Try all possibilities
  answer = INFINITY
  while(y >= 0):
    z = N - x - y;
    fixed_answer = min_cost_max_flow(A,x,y,z) as described above
    answer = min(answer, fixed_answer)

  return answer;
The above algorithm works very nicely. It passes many test cases, however, it is slow! A min-cost max-flow on about 10 nodes shouldn't take too long on average, but Dinic's algorithm runs on $$O(V^3)$$, so this could be really slow in the worst case. We have to do $$O(100N)$$ work with this algorithm, which timed-out.
This idea builds off the concepts from the last approach. Specifically, the key observation is that, if we have chosen $$x$$ and $$y$$ (that solve the equation $$3x + 4y = S$$, where $$S$$ is the sum of the elements in the array), then the problem becomes much simpler, and can be solved using optimization methods. In the above approach, we chose to use min-cost max-flow to solve the simplified problem (see the major algorithm of that section). Instead of using that, we can try to solve the sub-problem directly. Specifically, we still set-up the same "costs", since these costs still directly model our major problem. But instead of doing a slow min-cost max-flow algorithm, we recognize that this is a really special case, and we determine what key operations the min-cost max-flow algorithm is really doing in this special case. Then we design a special purpose algorithm that solves the problem directly, which has nothing to do with flows.
Would greedily assigning the units work?
By "greedily", I mean, in accordance with the following algorithm.
Assume a fixed x, y, z as described in the previous section.
Assume the array A is given with N elements.
Assume amt[v] is the number of times the value v occurs in the array A (0 <= v <= 4)

  int cost = 0;
  int ending[] = {z,0,0,x,y};
  for v = 0..4 do:
    for w = 0..4 do:
      int space = min(amt[v], ending[w]);
      cost += space*max(v-w,0);
      amt[v] -= space;
      ending[w] -= space;
  return cost;
That is, we simply start with 0, and transform as many cells into 0 as possible (i.e.: unchanged). We then transform as many cells into 3 as possible, then to 4. We then repeat the same for 1, 2, 3, and 4.
Does the above algorithm always terminate correctly (i.e.: all amt[] are 0, and all ending[] are 0)? Also, does the algorithm actually terminate with the minimum possible cost?
After each iteration of the $$v$$-loop in the algorithm above , amt[v] will be zero.
By construction, we know that, at the beginning of the algorithm, the sum of all ending[w] = x + y + z = N, and the sum of all amt[v] = amt[0] + amt[1] + ... + amt[4] = N. And at each iteration of the inner-most loop, we decrease some amt[v] and ending[w] by the same amount (space...). And all amt[v] and ending[w] stay non-negative. So the sum of all amt[v] must always stay equal to the sum of all ending[w]. So, if some amt[v] is not zero, then some ending[w] must be non-zero as well (so that both sums are positive), so the inner loop will see that and decrease amt[v]. Hence, by the end of the inner loop, amt[v] must go to zero.
Therefore the algorithm terminates correctly, with some valid cost. Now, how do we prove that the algorithm terminates with a minimum cost.
The algorithm above terminates with a minimum cost.
I will prove the correctness of this algorithm based on a reduction to min-cost max-flow. Specifically, each iteration of the inner loop increases the number of cells which go from $$v$$ to $$w$$ for some $$v$$ and $$w$$. This is equivalent to increasing flow on an augmenting path in the max-flow graph. Notice that, the order in which we do this guarantees that we are finding minimum cost transfers each time. This is equivalent to finding minimum cost augmenting paths in the flow network. Which means that this algorithm is equivalent to the min-cost max flow algorithm which simply finds minimum cost augmenting paths. Since these two algorithms are equivalent, and min-cost-max-flow is correct, the greedy algorithm must be correct.
In general, this problem is a nice little mix between greediness and iterative optimization. The key realization needed to solve this problem was that an instance was solvable if and only if the sum $$S$$ of the elements can be written as a linear combination $$S = 3x + 4y$$ with non-negative coefficients $$x,y$$. More specifically, this meant that the range of possible $$(x,y)$$ combinations was limited (i.e.: linear in $$S$$, which is linear in $$N$$). And, once we have chosen a particular $$(x,y)$$, we can actually solve this problem greedily. In truth I don't entirely understand exactly how and why this problem can be solved greedily, and why knowing $$x$$ and $$y$$ allows for this. My reasoning was indirect based on the fact that, with a fixed $$x$$ and $$y$$, we could reduce the problem to minimum-cost max-flow; and I used the theory surrounding that problem to (intuitively) prove that the greedy solution works (i.e.: by showing the greedy algorithm to be equivalent, in this case, to min-cost max-flow). But a more direct proof would be better. Anyway, with that knowledge, we try all possible $$(x,y)$$ in linear time, and then, for a given $$x,y$$ we know how many 3's there must be, how many 4's, and how many 0's (all the rest). We solve this greedily as described, which takes some constant amount of time (proportional to about 16 iterations), and we take the minimum over all such choices as our final answer.
  1. The problem is solvable if and only if $$S = 3x + 4y$$ for some $$x,y \geq 0$$. This was quickly found. However, this lemma became even more crucial when we realized that it yielded more information. It didn't just tell us "when" there was a feasible solution, but it also characterized the set of feasible solutions into a small ($$O(N)$$ sized) set, which could be used to simplify the problem
  2. With a fixed $$x$$ and $$y$$, this problem can be solved by minimum-cost max flow.Even since my initial observations, I was "looking" for a flow algorithm to solve this problem (probably based on experience or just a false start). With a fixed $$x$$ and $$y$$, I finally found what my intuition was looking for. In the end, this flow solution was (admittedly) over-kill. That is, this problem did not REQUIRE mincost max flow, although such an algorithm was correct. In fact, it actually turned out to be too slow (Time Limit Exceeded). However, it gave me another way of thinking about the problem, and helped me to come to the final "greedy" solution by characterizing the set of augmenting paths that a max-flow algorithm would do anyway, and just applying them directly to this special problem. It also helped me prove (or at least convince myself of) the correctness of the greedy approach, by understanding how it reduced to mincost-maxflow anyway, since I was already convinced of the correctness of min-cost max-flow. In the end, this flow reduction was a neat little tool for problem solving, but it was surely not the only way to come to the solution.
  3. There is only a linear number of choices for $$x$$ and $$y$$. This was key as mentioned above.
  4. With a fixed $$x$$ and $$y$$, a greedy algorithm can be used to find the cost. This is the final piece of the puzzle. (Note, I would still like to find a proof of this that does not use min-cost max flow)
  1. The "conservation" property implies that this problem has some "flow" characteristics. With this in mind, I pursued a flow based solution. The bad part was that I took a very long time trying to come up with these solutions, instead of just keeping it in mind. It was actually a false start
  2. After being stuck for over an hour, looking at an example by hand, without any auxillary constructions (i.e.: looking at an array, rather than the graph that I have constructed), jogged my memory and mental capacities enough to realize the key idea behind the solution. In the future, whenever I am stuck, I should look at examples more often in order to pick up properties of the problem that I am missing.
  3. Given a fixed $$x,y$$ can we solve the problem? This was crucial to find the solution. I should have attempted to simplify the problem earlier. This concept, I recall, occurred to me early on; I just didn't really think too much of it.
  4. I've recently learned how to solve the problem: Find a linear combination with non-negative coefficients using graphs. This also implied that I could solve this simpler version as well. Hence, I could immediately come up with a characterization for when an instance was solvable; I should have pursued this concept further instead of immediately switching over to the flows. In specific, instead of saying: "Ok, the problem is solvable if and only if it is written as $$3x + 4y$$", I should have went deeper and asked: "What about a specific $$x$$ and $$y$$; does this tell us anything?"
  • Optimization / Greedy Optimization
  • Flows / Min-Cost Max-Flow
  • Linear Diophantine Equations / Linear Integer Combinations with Non-Negative Coefficients
  • Invariants / Flow Conservation / Invariant Sum
  • Don't just look for "yes-no" characterizations. Sometimes it is beneficial to explore characterizations more deeply. Specifically, I used the $$3x + 4y$$ characterization only for the purpose of determining whether a solution existed or not. Beyond that, I discarded it. Instead, once I came up with the characterization, I should have used it to more deeply explore the structure of the problem. Specifically, I could have observed / written down that there are only a finite (and linear) number of choices for $$x$$ and $$y$$, and I could have possibly, more quickly came to the problem simplification: "what if we fix $$x$$ and $$y$$?" Then I would have spent my time more wisely.
  • Sometimes it is beneficial to use reductions as proof methods rather than as final solutions. In this case, the min-cost max flow reduction was beneficial in proving the correctness of another simpler more specialized algorithm. Applying min-cost max-flow directly was correct, but it was much slower (although this could be because I used a slow version of min-cost max-flow algorithm). As another example, we notice that many optimization problems are linear programs (including min-cost max-flow, and hence, this problem). In particular, this means that every such linear optimization problem can be solved by the Simplex Algorithm. However, the Simplex Algorithm is relatively slow, so it wouldn't be feasible to use it in an actual contest on many constraints. However, knowing the reduction to the Simplex Algorithm, one can prove powerful results about the characteristics of the given problem at hand (for example, that it has a dual, etc.)
  • Use problem-solving techniques when you get stuck. Specifically, the observation I needed to solve the problem was attained rather randomly, not by thinking deeply about observations needed, but by looking at a rather simple example. Looking at an example allowed me to divert my attention from the false approach I was focused on (i.e.: min-cost max-flow reductions), and it also provided me with more questions about the structure of the problem. By both of these facts, I came to the crucial observation.
  • Test your program on large test cases. Especially for problems where it is easy to construct sample input, I should test it on large test cases. This is useful for problems where I am "unsure" whether it will pass in time. For this problem, I was unsure whether a min-cost max flow algorithm was fast enough, so I should have run this algorithm on a large random test case with $$N=1,000,000$$ entries. This would have revealed that the min-cost max flow would get TLE and would imply that I should try something else (or at least try to optimize my approach; i.e.: maybe it would have suggested starting with the greedy algorithm, and resorting to min-cost max-flow once we had a "good enough" solution from the greedy algorithm; this would actually also demonstrate that the greedy algorithm always works; and maybe I would have realized that I didn't even need the min-cost max-flow at all).