PaperAndPaint
You are given a blank $$W \times H$$ sheet of grid-paper. You are also given a sequence of $$K$$ "operations". Each operation specifies a set of instructions for folding the paper and applying paint to the folded paper in the form of an axis-parallel, lattice-aligned rectangle. When applying paint to a rectangular portion of the folded paper, this paint "seeps through" and will also be applied to any corresponding portion of the paper that lies directly "under" the rectangle.
After applying an operation, the paper is unfolded and flattened. After all of the operations have been applied, the final paper will have some set of "painted" cells, and some set of unpainted cells. Return the total area of painted cells
(See the Problem Statement for further clarification)
- Geometry
- $$K$$ is small, coordinates are huge
- Reduce to Rectangles
- Line Sweep - Union of Rectangles
- Folding, Reflections, Scaling
- Coordinate Compression
Let's unfold the paper, and see what pattern of rectangles we would get.
There will be a painted rectangle in the bottom-most section $$(0..h)$$ described by $$R_0 := (x1[i],y1[i],x2[i],y2[i])$$ (we can assume that $$0 \leq y1[i] \leq y2[i] \leq h$$). If we reflect $$R_0$$ about the line $$y = h$$, we get another painted rectangle, $$R_1$$. If we reflect this new rectangle $$R_1$$ about the line $$y = 2h$$, then we get the next rectangle $$R_2$$. Reflecting $$R_2$$ about $$y = 3h$$ gives us $$R_3$$, and so on. In general: rectangle $$R_{k}$$ is the vertical reflection of rectangle $$R_{k-1}$$ about the line $$y = k \cdot h$$, for $$k \in \{1,..., cnt[i]\}$$
- Suppose we are given an operation of the form $$(xfold[i], cnt[i], x1[i], y1[i], x2[i], y2[i])$$.
- First "draw" the rectangle $$R_0 := (x1[i], y1[i], x2[i], y2[i])$$
- Let $$h := \frac{H}{cnt[i]+1}$$ be the height of a section when folded vertically.
- For $$k = 1..cnt[i]$$:
- Let $$R_k$$ be the reflection of rectangle $$R_{k-1}$$ about the line $$y = k\cdot h$$
- "Draw" rectangle $$R_k$$
- For all previously drawn rectangles $$R_k$$ with $$k = 0..cnt[i]$$ (including $$R_0$$):
- Shift rectangle $$R_k$$ by $$xfold[i]$$ units to the right (this is the real $$R_k^{'}$$).
- Make a copy of this new rectangle, and reflect the copy about the line $$x = xfold[i]$$ (this is the real $$R_k^"$$).
Once we have these implemented, and once we implement Algorithm 1, we reduce the entire problem to something simpler! We can now assume that our input is just a set of rectangles (represented possibly as four-tuples $$(x_1,y_1,x_2,y_2)$$. Each operation paints a set of the rectangles (according to Algorithm 1), and so we just take the "union" of all the rectangles painted according to all operations (they can possibly overlap, but that's okay!).
Our final problem is this: Given a set of rectangles, find the total area of their union.
The next idea section describes how to solve this reduced problem.
We need something smarter than a naive "brute force" algorithm. Notice that the number of operations $$K$$ is at most $$50$$. Moreover, $$cnt[i]$$ is at most $$1,000$$. From Algorithm 1 in Idea Section 1 above, this implies that the total number of final rectangles that we generate is at most $$N := 2\cdot K \cdot (cnt[i] + 1) \leq 100,100$$. Even an $$O(N^2)$$ algorithm would be too slow in the worst case.
Based on the constraints, we expect that we will need something like an $$O(N \lg N)$$ algorithm to solve this problem, if at all possible.
So how does this "Line-Sweep" method work?
In general, you are given a large set of $$N$$ geometric objects (lines, polygons, rectangles, sometimes circles, etc.), and you would like to compute some property that depends on their geometrical arrangement (for example: intersections of lines or union of area of rectangles) in $$O(N \log N)$$ or $$O(N \log^2 N)$$ time (etc).
Start by sorting these objects, usually according to their left x-coordinate. Now imagine a vertical line placed at the left-most "relevant" x-coordinate. You then "sweep" (move) this line steadily to the right. Each time it hits a "relevant" x-coordinate (in our case, the start or end of some rectangle), you process some information about the region between this current coordinate and the previous relevant coordinate. (For calculus students, this is like piecewise "integration" over all x-coordinates.) We hope that processing information about the region between two relevant coordinates can be done quickly and efficiently (e.g.: $$O(\log N)$$ time) - usually with the help of some data structure.
Combining the results across all x-coordinates will yield our overall answer.
(Note: If this is your first time ever seeing or hearing about the Line Sweep method, do not be disheartened. I would call it an intermediate or advanced technique in Computational Geometry, and it can be tricky to code. It is also a "general" method, so the details of the particular problem can be hard to determine as well. In fact, for me, PaperAndPaint was my first time ever using Line Sweep to solve the union of rectangles problem!)
Suppose we have a set of x-coordinates $$x_1,x_2,x_3, ..., x_n$$ denoting the "relevant" x-coordinates, where each such $$x_i$$ coincides with the left boundary or the right boundary of at least one of the input rectangles. Also assume that every rectangle has its left and right boundaries as some $$x_i$$ and $$x_j$$ in this set (for some $$1 \leq i,j \leq n$$). Now, consider the "region" between any two relevant points $$x_i$$ and $$x_{i+1}$$. Then: the total area covered (painted) in this region is just the sum of areas of simple rectangles, each with width $$x_{i+1} - x_i$$.
By definition of the $$x_i$$ coordinates, $$x_i$$ must either coincide with the start (left-boundary) or the end (right-boundary) of some rectangle(s). Let's consider both cases.
Suppose some rectangle starts at coordinate $$x_i$$, with top and bottom $$y$$ coordinates $$y_1$$ and $$y_2$$. Then the $$y$$-coordinates $$y_1 \rightarrow y_2$$ will be "covered" (painted) for the region between $$x_{i}$$ and $$x_{i+1}$$ if they were not already covered before. All other $$y$$-coordinates would stay the same as before.
Suppose some rectangle ends at coordinate $$x_i$$, with top and bottom $$y$$ coordinates $$y_1$$ and $$y_2$$. Then the $$y$$-coordinates between $$y_1$$ and $$y_2$$ will no longer be covered by that rectangle for the region between $$x_{i}$$ and $$x_{i+1}$$. They may still be covered by some other rectangle.
And if there are multiple such "events" happening (starting and ending of rectangles) at the same $$x_i$$ coordinate, we would try to process all of them incrementally (somehow).
Then, the key idea is that, between the region $$x_i$$ and $$x_{i+1}$$, if we could find the total length of $$y$$ coordinates that have been "covered", we take this length, and multiply by $$x_{i+1} - x_i$$ to get the total area covered in this region. This is because, as we noted earlier, the area in this region comes in the form of a set of sub-rectangles, each starting at $$x_i$$ and ending at $$x_{i+1}$$, and thus having width $$x_{i+1} - x_i$$. So the total area in this region is equal to this width ($$x_{i+1} - x_i$$) multiplied by the total length of $$y$$ coordinates (i.e.: total sum of heights of these sub-rectangles).
- Maintain a list of covered $$y$$ coordinates / intervals.
- $$update(y_1,y_2)$$ which means to "update" the interval between $$y_1$$ and $$y_2$$ to be covered.
- $$undo(y_1,y_2)$$ which means to "undo" a previous "update" command for the interval between $$y_1$$ and $$y_2$$. Any $$y$$ coordinate in this range becomes "uncovered" if all previous updates covering this $$y$$ coordinate have been un-done.
- $$getSum()$$: Get the total length of covered $$y$$ coordinates (as a number)
Once we have a description or an idea of what we need, the last part of the solution usually comes from experience. It is good to have a library of known "generic data structures" that one can use. We can often (usually, always) modify/augment one of these known data structures to be able to implement the operations we desire!
(See Wikipedia or Introduction to Algorithms (3rd Ed.) by CLRS, Section 14.3 - Interval Trees for some of its variants)
Again, if you have never seen this before, it can be difficult. But, as mentioned, it is very useful to have a library of data structures that you have either memorized or saved for contests such as TopCoder. In C++, even the builtin Data Structures (set,queue,vector,map,list,deque,bitset) are super useful. In addition, you should probably have heard of the Fenwick Tree (or Binary-Indexed Tree), Segment Tree, Balanced Binary Search Tree (such as Red-Black Tree), and/or the Interval Tree.
In our present case, the four operations can each be implemented by a modified Interval Tree (or possibly a Segment Tree, although I sometimes cannot remember the difference). I leave the implementation of the Interval Tree as an "exercise" (sorry!). But it can be shown that each of the operations runs in $$O(\lg N)$$ time or $$O(\lg^2 N)$$ time, where $$N$$ is the number of rectangles.
- Compute the set of relevant $$x$$ coordinates $$x_1, x_2, ..., x_n$$.
- Construct an empty Interval Tree: $$T$$ (with the operations as described above).
- Set $$ans = 0$$.
- For each $$i = 1..n$$:
- If $$i>1$$: increment $$ans$$ by the value: $$(T.getSum() \times (x_{i} - x_{i-1}))$$
- For any rectangle with left boundary at $$x_i$$, with bottom/top boundaries $$y_1$$ and $$y_2$$: call $$T.update(y_1,y_2)$$
- For any rectangle with right boundary at $$x_i$$, with bottom/top $$y_1$$ and $$y_2$$: call $$T.undo(y_1,y_2)$$
- Return $$ans$$
Upon first reading this problem, it's nice to get some intuition about how the operations actually work (See the real Problem Statement for further clarification). For me, that meant looking at examples. I even took some physical paper and tried to fold it as described in the problem statement. Using a pencil, I tried to understand what patterns would occur when the paint "seeps through" the paper.
Eventually, I broke down the entire problem to a "union of rectangles". That is, since the paint is applied in a rectangular fashion, it will seep through the folded paper making rectangular blotches at various points in the paper. After playing around with examples, it is not too hard to come up with and write out an algorithm for describing the set of rectangles. There is a lot of symmetry with the operations: most of the rectangles can be described by horizontal and/or vertical reflections of the original rectangle around some simple lines. Hopefully the reader is comfortable with geometry. (If not, please refer to the section Idea #1 above, where we show how to do this.)
Once we know the set of (possibly overlapping) rectangles that would describe the paint pattern on the unfolded paper, the next step actually reduces to a well-known problem in computational geometry: to find the union of a set of rectangles. Many computational geometry problems of this form can be solved by a technique called the "Line-Sweep" (or "Sweep-Line") Method. A good introduction to this method is in Introduction to Algorithms by CLRS (Chapter 33.2 - Determining whether any pair of segments intersect), we also describe some more examples in the related sections below.
Line Sweep is a general method (not a specific algorithm). For this particular problem, we use it as follows: Given a set of $$N$$ possibly-overlapping rectangles, we extract the set of all "relevant" $$x$$-coordinates -- those which correspond to the left or right boundary of at least one of the rectangles. We then "draw" a vertical line at the left-most relevant coordinate. As we sweep (shift) the line from left-to-right, we consider the regions that occur between neighbouring "relevant" coordinates. Using a data structure (in this case, the Interval Tree), we can efficiently maintain, update, and compute the total painted area within one of these given regions. By summing over all regions, we thereby get the answer for the entire problem.
Again, CLRS (3rd. Ed.) Chapter 33 (Computational Geometry) gives a walk-through of the Line-Sweep method, using it to solve a classic Computational Geometry problem: to find the intersections of many line segments. For our present problem of finding the area of the union of a set of rectangles, we go into much more detail (and follow the problem-solving process) in Idea Section #2 above.
All-in-all, PaperAndPaint simplifies to two independent sub-problems: a) Find the set of rectangles that describe all the paint, and b) compute the area of the union of these rectangles by a Line-Sweep method. Independently, these problems are non-trivial, but with careful coding, they can be solved!
- Suppose we ignore $$xfold[i]$$ (folding horizontally) and suppose we were given only $$(cnt[i],x1[i],y1[i],x2[i],y2[i])$$ for the $$i^{th}$$ operation. Ley $$H$$ be the total height of the paper, and let $$h := \frac{H}{cnt[i]+1}$$ be the height of each vertically-folded section. If $$R_0 = (x1[i],y1[i],x2[i],y2[i])$$ is the initial rectangle, and $$R_{k}$$ is the reflection of rectangle $$R_{k-1}$$ about the line $$y = kh$$, then the set of painted regions in the unfolded paper after performing this operation is exactly the union of $$R_0, R_1, R_2, ..., R_{cnt[i]}$$. This is rather a handful to write. But it is important. It provides a very convenient method for representing the vertical-folding in terms of a single rectangle $$R_0$$ and reflections about simple lines.
- Suppose we fold about the line $$x = xfold[i]$$ first, and draw some rectangle $$R = (x1[i],y1[i],x2[i],y2[i])$$ in the coordinate system of the folded paper. Then this will yield two rectangles in the unfolded paper: a) $$R$$ shifted by $$xfold[i]$$ units to the right, and b) $$R$$ shifted by $$xfold[i]$$ units to the right and then reflected about the line $$x = xfold[i]$$. This shows that we can easily represent folding horizontally by simple operations as well. All-in-all this tells us that we can start with a single rectangle $$(x1[i],y1[i],x2[i],y2[i])$$ and make copies with simple transformations to get the total set of painted rectangles in the unfolded paper.
- Let $$x_1,x_2,x_3, ..., x_n$$ be the set of "relevant" x-coordinates, where each such $$x_i$$ coincides with the left boundary or the right boundary of at least one of the input rectangles. Also assume that every rectangle has its left and right boundaries as some $$x_i$$ and $$x_j$$ in this set (for some $$1 \leq i,j \leq n$$). Then the region between $$x_{i}$$ and $$x_{i+1}$$ for any $$i$$ is composed of a set of rectangles of width $$(x_{i+1} - x_{i})$$. By construction, no rectangle can end inside one of the regions; they must end on the boundaries. So the region between a particular $$x_{i+1}$$ and $$x_{i}$$ is completely "homogenous". This makes it much easier to describe the area.
- After simplifying the operations to a set of rectangles, the total area of the rectangles can be found by Line-Sweep method. We maintain a sweep line from left-to-right, along with an Interval Tree data structure. As we process relevant points, we use the Interval Tree to query the total area. This is the final algorithm.
- Geometry is a very visual subject. It's easier to understand geometry problems and related algorithms / formulas when you have a visual picture. In this case, looking at examples of how the folding works made it much easier to characterize it. I even physically got some paper out and drew on it with pencil to get a better understanding. In addition, when we were coming up with our Line Sweep algorithm, looking at pictures and examples also helped.
- This problem was really the sum of many simpler problems. For example, the folding process was really two separate folding processes (horizontal and vertical) combined into one. By deriving methods to handle each of these separately, it was really easy to combine them to understand the folding process as a whole. This kind of problem simplification generalizes to other kinds of problems as well. Also, we completely simplified the folding process (in fact, eliminated it altogether) by representing the painting operations as a set of rectangles. In this way we reduced the whole problem to "finding the union of a set of rectangles", which we could solve using other known techniques. Lastly, we used this problem-solving technique during the construction of our line-sweep method. We simplified the process by considering the "relevant" regions, and by focusing on how the data changes from one region to the next adjacent one. This allowed us to use a data structure.
- Looking at the constraints on the number of rectangles given, and how many we would generate in our "unfolding" process, we were able to recognize that an $$O(N \log N)$$ (or something similar) would be an ideal solution. This helps to guide our search for a solution.
- The latter part of this problem -- finding the total area of a bunch of rectangles -- would have been extremely difficult if it were not for some experience. I thought about using Line-Sweep because I have seen Line-Sweep being used to solve similar problems before. I also recalled that this entire problem was probably a well-known one. In addition, when trying to implement the line-sweep algorithm, knowledge (experience) with data structures such as Interval Trees made this problem much easier than it otherwise would have been. In fact, without some degree of experience with these intermediate and advanced algorithms (at least having "heard of them before"), it might have been impossible to solve this TopCoder 1000 point problem.
- Working backwards, as we learned, is sometimes helpful. Instead of asking "how can I solve this problem", we might consider an approach and ask: "Here is an idea or approach or method? Does this solve the problem?" It turns a "discovery" task into a "verification" task. And sometimes, that "idea or approach or method" is actually easier to find (usually from intuition or experience). In our case, this came in the form of a data structure.
- Line Sweep is a generic method to solve some computational geometry problems involving intersections and unions. We can use line sweep to find the union of areas of rectangles, to find the intersections of line segments, to find the point that is most overlapped by some polygons, etc. Sometimes additional power is needed; this usually comes in the form of a data structure, or other techniques such as compressing the coordinates (i.e.: storing only relevant points)
- Work backwards! This is useful for data structures problems. It is kind of an art trying to find the right data structure (and/or the right way to augment a data structure) to support operations for a specific problem. By asking "what do I want" as well as "what do I have", in various combinations, you can eventually come to a description that is easy to formulate in terms of known data structures.
- Having a library of known data structures (either memorized, on paper, or electronically) is useful. To solve many advanced problems its useful to have knowledge of the classic / builtin data structures of your language, as well as knowledge of the most common intermediate data structures: Fenwick Trees, Segment Trees, Interval Trees, and Binary Search Trees.
- I'm keeping this! As mentioned, it's sometimes good to have a library of code and data structures. In particular, since I have implemented an Interval Tree for this problem, I am saving it in my own personal library for later. This helps you to become more efficient later (because TopCoder and other websites allow you to use pre-written code). But beware, you may forget completely how to code it yourself, if you always can copy-and-paste it from a library. This may make life harder later!
- Geometry / Computational Geometry / Rectangles
- Geometry / Coordinate Compression
- Geometry / Line Sweep / Union of Area of Rectangles
- Math / Formulas
- Data Structures / Interval Tree