**Under construction!**Website upgrade pending. Some features / pages may be missing.

# Set Multiples

Let $$S$$ and $$T$$ be two sets of integers. We say that $$T$$ *is a multiple of* $$S$$ if, for every integer $$x \in S$$, there is an integer $$y \in T$$ such that $$y$$ is a multiple of $$x$$ (that is, $$y = kx$$ for some integer $$k$$).

We are given four integers $$A,B,C,D$$ such that $$1 \leq A \leq B \lt C \leq D \leq 10^{10}$$. Let $$S$$ be the set of integers $$\{ x : A \leq x \leq B \text { or } C \leq x \leq D\}$$. We would like to find the smallest subset $$T$$ of $$S$$ that is also a multiple of $$S$$. Output the size of this set $$T$$.

*Note: $$S$$ is a subset of $$S$$, and $$S$$ is a multiple of $$S$$, so a solution always exists.*

- Set union
- Bitset
- Greed
- DP
- Graph (edges from divisors to multiples)
- Closure under an operation
- Intervals
- Matrix
- Primes, Prime Factorization, Seive
- SQRT factorization and SQRT tricks
- Tree
- Want logarithmic time solution

**Notation:** We will use $$S$$ to denote the given set, and $$T$$ to denote the "goal" or optimal set that satisfies the constraints. We say we "choose" (or "select") an integer $$x$$ in an algorithm if it results in $$x \in T$$ when the algorithm is completed.

```
Let T = {}
for x = D downto 1:
if (A <= x<=B or C<=x<=D):
if (no multiples of x exist in T):
add x to T
```

That is, we consider the $$x$$ in descending order, and we greedily "select" a number $$x$$ if there is no other multiple of $$x$$ already taken in $$T$$. Obviously, since $$D \approx 10^{10}$$, this algorithm is completely inefficient. But it does yield a good characterization of the optimal solution $$T$$. We can now use this to prove the correctness of other ideas/algorithms, by showing that they return a set $$T$$ that is equivalent to the one returned by this solution.

In Idea 1, we were able to characterize how the optimal $$T$$ would look, and we came up with a greedy (but slow) way to construct $$T$$. Next, we inspect this algorithm and try to learn more about the structure of the problem. By inspection (especially by looking at examples), we come up with another neat observation.

Moving on, we can make similar observations to this one. We can ask the questions: "What if $$C \leq \frac{D}{2}$$? What if $$C > \frac{D}{2}$$?".

This really is a corollary to the first Key Observation. If $$C \leq \frac{D}{2}$$, then we already know that all numbers between $$\left\lfloor\frac{D}{2}\right\rfloor + 1$$ and $$D$$ *must* be chosen (this is precisely the Key Observation).

We now show that the greedy (optimal) algorithm (from Idea 1) would not choose any other elements. In particular, we show that for all $$x \leq \frac{D}{2}$$, there is an integer $$y$$ between $$\left\lfloor\frac{D}{2}\right\rfloor + 1$$ and $$D$$ that is a multiple of $$x$$. (Note: this is not very hard to prove, and the reader is encouraged to do so independently.)

Take any integer $$x \leq \frac{D}{2}$$. Then consider the largest $$k \geq 1$$ such that $$kx \leq \frac{D}{2}$$. Then, by construction, we have that $$(k+1)x \geq \left\lfloor\frac{D}{2}\right\rfloor + 1$$ (since $$kx$$ was the largest multiple smaller than this). And since $$x \leq \frac{D}{2}$$ and $$kx \leq \frac{D}{2}$$, we therefore have that $$(k+1)x = (kx + x) \leq \frac{D}{2} + \frac{D}{2} = D$$. So this multiple $$(k+1)x$$ is precisely in the range of already chosen numbers. Hence $$x$$ always has a multiple in this range.

Thus, the Optimal Greedy Algorithm would never select another number. So this is optimal.

We have just shown that if $$C \leq \frac{D}{2}$$ then the optimal solution (i.e.: the algorithm above) will select exactly those numbers between $$\left\lfloor\frac{D}{2}\right\rfloor + 1$$ and $$D$$, which will suffice.

In these cases we must select all of $$C..D$$ (since none of these are multiples of each other), but we may have to select some additional numbers between $$A$$ and $$B$$. Again, we argue that we would only need to select numbers greater than $$\frac{B}{2}$$ but not greater than $$B$$ (since all numbers $$\leq \frac{B}{2}$$ have some multiple in that range). But even so, we don't select them all, because many of these will have some multiple in the already chosen set between $$C$$ and $$D$$.

Since these are contiguous intervals ($$[\left\lfloor\frac{B}{2}\right\rfloor + 1 .. B]$$ and $$[C..D]$$), here is one way to proceed.

For a fixed integer $$k>1$$, we consider the interval $$\left(\left\lceil \frac{C}{k} \right \rceil .. \left\lfloor \frac{D}{k} \right \rfloor\right)$$. It turns out that every integer $$x$$ in this range has a multiple $$kx$$ in the original range ($$C..D$$) (Proof omitted). Using a form of "Brute-force", we therefore "try all possible $$k>1$$". Consider $$\left\lceil \frac{C}{2} \right \rceil .. \left\lfloor \frac{D}{2} \right \rfloor$$, then $$\left\lceil \frac{C}{3} \right \rceil .. \left\lfloor \frac{D}{3} \right \rfloor$$, ..., and so on, keeping track of which of these intervals overlap with $$[\left\lfloor\frac{B}{2}\right\rfloor + 1 .. B]$$, and marking all integers in these intervals as "factors" / "not-chosen" (Implementation details omitted).

Assuming we figure out the implementation details, this will yield an algorithm that correctly marks all numbers in $$[\left\lfloor\frac{B}{2}\right\rfloor + 1 .. B]$$ that have a multiple in $$[C..D]$$. By taking the union of $$[\left\lfloor\frac{B}{2}\right\rfloor + 1 .. B]$$ and $$[C..D]$$, minus the marked numbers, we will have all the numbers that would be chosen in an optimal solution.

In our *Problem Simplification*, we said we were looking for all the numbers in $$[\left\lfloor\frac{B}{2}\right\rfloor + 1 .. B]$$ that have some multiple in $$[C..D]$$. The above algorithm, however, solves this problem in reverse: we consider all numbers in the range $$[C..D]$$ and find all factors of them within the ranges of $$[\left\lfloor\frac{B}{2}\right\rfloor + 1 .. B]$$.

This is an example of "working backwards". We are looking at the inverse problem, which can sometimes be easier to solve.

At this point, I tried some examples. A *very* nice example turned out to be $$C = 200, D = 256$$, $$A$$ and $$B$$ can be anything reasonable. (I encourage the reader to try this with increasing values of $$k$$). We realize we don't have a bound on $$k$$, except that $$k \leq D$$. But we do notice after a while that these "intervals" stop being disjoint.

For example, take $$k=4$$ and then $$k=5$$. For $$k=4$$, the interval is $$\frac{200}{4} .. \frac{256}{4}$$ which is $$[50 .. 64]$$. But for $$k=5$$ (omitting the calculations), we get the interval $$[40..51]$$. Notice, the upper-bound ($$51$$) of the second interval overlaps with the lower-bound ($$50$$) of the first interval. The reader may verify that this continues to be the case for all $$k \geq 5$$ for this example (I only tried until $$k=10$$). This yields a very nice *pattern*: for large enough $$K$$, there is some $$v$$ such that all numbers $$\leq v$$ are covered by the numbers in the intervals $$\left\lceil \frac{C}{k} \right \rceil .. \left\lfloor \frac{D}{k} \right \rfloor$$, over all $$k \geq K$$.

```
SetMultiples1(A,B,C,D):
if C <= D/2:
# (Implicitly) Set T = {D,D-1,D-2,...,floor(D/2) + 1}
return ceil(D/2) # We only want the size of T
else:
Let marked = {} be an empty set
# The original ranges that define S
Let upperRange = (C,D)
Let lowerRange = (max(floor(B/2) + 1,A), B)
for all k = 2..INFINITY:
currentRange = (ceil(C/k), floor(D/k))
prevRange = (ceil(C/(k-1)), floor(D/(k-1)))
if intersection(currentRange, prevRange) isnt EMPTY: # They overlap
marked.insert(x) : for all x from floor(D/k) downto 1
break
else:
marked.insert( intersection(currentRange, lowerRange) )
return upperRange.size() + lowerRange.size() - marked.size()
```

In the above pseudo-code we represent intervals as (lower bound, upper bound ) pairs. For example, the variables `upperRange`

, `lowerRange`

, and `currentRange`

all represented intervals.

As proven earlier, this algorithm works fine when $$D-C$$ is large. Eventually the intervals $$(\frac{C}{k} .. \frac{D}{k})$$ will begin to overlap, and we can break out of the loop early. In the next Idea Section, we show how to handle the case when $$D-C$$ is small

In the idea section above, we described an algorithm that is guaranteed to terminate quickly whenever $$C$$ is much smaller than $$D$$ (that is, whenever $$D-C$$ is large). We won't review that algorithm now, but we recall that it runs in $$\Theta(\frac{D}{D-C})$$ time-complexity. We would now like to answer the following question:

In the previous Idea, we found an algorithm that is efficient whenever $$D-C$$ is large. We have also shown that the Algorithm from this current Idea is efficient whenever $$D-C$$ is small. From experience (e.g.: from Calculus) or by "Symmetry" we know that it's best to find a "balance" between these two solutions (i.e.: get their efficiencies as close to each other as possible).

This is similar to the "Square Root Trick" for factorization: given two numbers that multiply to $$n$$, at least one of the numbers must be smaller than $$\sqrt{n}$$, etc. So we can factorize $$n$$ in $$\sqrt{n}$$ time. This principle applies very often when solving problems: we often have two conflicting functions we want to optimize and we can prove that one of those functions will be small whenever the other is large. Usually if these are functions in some number $$n$$, one of the functions will be $$O(\sqrt{n})$$ or something similar.

Anyway, with this in mind, we have the following problem.

Recall that we have:

- an algorithm that runs well when $$D-C$$ is large (in $$\Theta(\frac{D}{D-C})$$ time)
- another algorithm that runs well when $$D-C$$ is small (in $$\Theta(\sqrt{D} (D-C))$$ time)

To "combine" these algorithms, we mean: given a $$C$$ and $$D$$, we choose whichever algorithm is faster for our particular input ($$D-C$$). If we do this combined algorithm, we get a time complexity $$F(C,D)$$ defined as follows:

\[ F(C,D) = \min\left\{ \frac{D}{D-C}, \sqrt{D} (D-C) \right\} \] For simplicity, let's assume that $$D$$ is a constant and we vary $$D-C$$. So we let $$x := D-C$$ be a variable, and we can write all of these functions in terms of $$x$$. So we really have: \[ F(x) = \min\left\{ \frac{D}{x}, x\sqrt{D} \right\} \] Now we are taking the minimum of two functions in $$x$$, one that is increasing and one that is decreasing, and we would like to find out "how bad can this get". In particular, we want to find the maximum point of this function. Using techniques from calculus, or just by intuitive reasoning, we know this is maximized whenever the two are equal. Hence, we can solve for $$x$$ by setting: \[\frac{D}{x} = x\sqrt{D} \] Solving for $$x$$ (math omitted), we get that $$x = D^{\frac{1}{4}}$$. This means, whenever the difference $$x = D-C$$ is more than $$D^{\frac{1}{4}}$$, we choose the first algorithm; and whenever the difference is less than $$D^{\frac{1}{4}}$$, we choose the second algorithm. In the worst case we will have: \[F(D^{\frac{1}{4}}) = D^{\frac{3}{4}}\] So, altogether we have a $$\Theta(D^{\frac{3}{4}})$$ algorithm, as desired.The lemma above actually gives us an algorithm for solving the problem, based on the two different concepts we have developed. See the Solution Summary (in the Review Section) below for a concise overview of the solution.

All-in-all, this problem required some interesting greedy observations to solve it efficiently.

We first observe that, for a given integer $$x$$, we "choose" $$x$$ to be in our final set if and only if $$x$$ has no larger multiples in $$S$$ (for simplicity, we will call all of these numbers "maximal" numbers). This always yields an optimal solution (proven above). Hence, we simply want to find all "maximal" numbers $$x$$ in the ranges $$A \leq x \leq B$$ or $$C \leq x \leq D$$.

To find an efficient algorithm, we can exploit the fact that $$S$$ consists of two disjoint intervals. This actually makes the problem much simpler and more "regular". For example, we can make the following observation: Between $$C$$ and $$D$$ (inclusive), we always need to select the numbers above $$\frac{D}{2}$$ (since none of these are "maximal"), if $$C \leq \frac{D}{2}$$. In fact, with a bit more work, we can show that, if $$C \lt \frac{D}{2}$$ then these are the *only* numbers we need to select (we can prove that any smaller number has a multiple in this set). So we can assume that $$C > \frac{D}{2}$$. Similarly, we can assume that $$A > \frac{B}{2}$$, since an analogous truth holds for $$A$$ and $$B$$.

At this point, we can assume $$C$$ is somewhat "close" to $$D$$ (in particular, $$C > \frac{D}{2}$$). To solve this, I tried one fairly intuitive algorithm: Look at the end-points of the interval $$(C,D)$$ and divide them by 2. Then every integer $$x$$ in this new interval $$(\frac{C}{2} .. \frac{D}{2})$$ has a corresponding multiple $$2x$$ in the original interval $$(C,D)$$. So all such numbers are clearly *not* maximal. A similar fact holds for $$(\frac{C}{3}..\frac{D}{3})$$, $$(\frac{C}{4}..\frac{D}{4})$$, ..., and so on. So, the algorithm is to try all these intervals, and "mark" the numbers $$x$$ that we find. One convenient fact about this is that most of these intervals are disjoint, so we don't need to explicitly mark the $$x$$, but we just count *how many* $$x$$ are in the range and also in the range $$\frac{B}{2} .. B$$. When this algorithm terminates, the size of our final set will be $$(D-C+1) + (B - \max(A-1,\frac{B}{2})) - |\{\text{marked numbers}\}|$$ (the numbers in our given ranges, minus the number of "marked" factors).

By inspection, the intervals $$(\frac{C}{k}..\frac{D}{k})$$ will eventually intersect with each other. Actually, you can prove that, after a certain $$k$$, the intervals will cover all numbers from $$\frac{D}{k}$$ down to 1. So we can actually "exit" early. This happens whenever $$k > \frac{D}{D-C}$$ (we proved this in Idea 2), so it happens more quickly whenever $$D-C$$ is large (or when $$D$$ is small, but we can't assume this).

By noting that this algorithm works when $$D-C$$ is large, we can ask the next intuitive question: "What about when $$D-C$$ is small?". To answer this, we realize that we can just simply explicitly find all the factors of the numbers between $$C$$ and $$D$$. We mark these factors that overlap with the chosen part of the $$A..B$$ interval. This runs in $$O((D-C)\sqrt{D})$$ time, since there are about $$D-C$$ numbers in the range $$C..D$$ and each number requires $$O(\sqrt{D})$$ time to find its factors. So, this algorithm is clearly better when $$D-C$$ is smaller.

We now have two "complimentary" algorithms, the first having $$O(\frac{D}{D-C})$$ time-complexity, and the second having $$O((D-C)\sqrt{D})$$ time-complexity. We can combine these algorithms by always choosing the "minimum" one; that is, we choose whichever algorithm runs faster for our given input. We then have an algorithm that has time-complexity: $$O( \min\{\frac{D}{D-C}, (D-C)\sqrt{D}\} )$$. Using some intuitive arguments (or by calculus) this function has its worst-case complexity of $$D^{\frac{3}{4}}$$ when $$(D-C) \approx D^{\frac{1}{4}}$$. Since $$D \leq 10^{10}$$ this just (barely) runs correctly under the time-limit for TopCoder.

Accepted.

- We select an integer $$x$$ to be in our set if and only if it has no other multiples in the set $$S$$. This was pretty intuitive to come up with, but it was helpful when trying to find a fast algorithm.
- Given a single interval $$S = (C..D)$$ we MUST select the numbers greater than $$\frac{D}{2}$$ in an optimal solution $$T$$.
- If we take an interval of integers from $$C$$ to $$D$$, and we divide both end-points by an integer $$k$$, then every integer in this new interval ($$\frac{C}{k} \text{ to } \frac{D}{k}$$) will have a multiple $$kx$$ in the original range. This formed the basis of our first algorithm
- Take an interval of integers from $$C$$ to $$D$$. Consider the intervals $$(\frac{C}{2} .. \frac{D}{2})$$, $$(\frac{C}{3} .. \frac{D}{3})$$, $$(\frac{C}{4} .. \frac{D}{4})$$, ..., $$(\frac{C}{k-1} .. \frac{D}{k-1})$$, $$(\frac{C}{k} .. \frac{D}{k})$$. Then $$(\frac{C}{k-1} .. \frac{D}{k-1})$$ will overlap with $$(\frac{C}{k} .. \frac{D}{k})$$ if and only if $$k \geq \frac{D}{D-C}$$. This showed that we could quit this algorithm early whenever $$k$$ gets large enough. In particular, we can end earlier if $$D-C$$ is larger.
- It takes $$O(\sqrt{D})$$ time to find all the factors of a number. Hence, all factors of all numbers between $$C$$ and $$D$$ can be checked in $$O((D-C)\sqrt{D})$$ time. This gave us the basis of our second algorithm. This works whenever $$D-C$$ is small.
- Consider the time-complexity function $$F(x) = \min(\frac{D}{x}, x\sqrt{D})$$. This function achieves its maximum (worst-case) value of $$F(x) = D^{\frac{3}{4}}$$ when $$x \approx D^{\frac{1}{4}}$$. Since $$\frac{D}{x}$$ is decreasing and $$x\sqrt{D}$$ is decreasing, we can show that the worst-case point occurs when these two functions are equal. So we set $$\frac{D}{x} = x\sqrt{D}$$ and solve for $$x$$. This gives us the formula.

- This is largely a greedy / ad-hoc problem. These kinds of problems usually require finding good observations about the structure of the problem. It often helps to look at examples to help find these patterns. Also, once we started to find more and more observations, it was good to go back and try them on examples to get a good understanding of how they work.
- Once we found a greedy algorithm, we realize that we can make it efficient by exploiting the structure of the problem. In particular, the input set $$S$$ will always consist of two disjoint intervals. (This might have been much harder if we were given an arbitrary set $$S$$.) Another way to exploit the conditions is to notice that $$D \leq 10^{10}$$. This implies that a $$\Theta(\sqrt{D})$$ algorithm should be possible, since $$\sqrt{D} \approx 10^5$$ is right around a "reasonable" input-range.
- If you've never seen "the Square Root Trick" for factorization, then this problem becomes much harder. In addition (see the Learning Points), I've learned from experience that this "Square Root Trick" applies (in principle) to other kinds of problems. This helped us to analyse the running time achieved when combining the two algorithms. You also need some experience and intuition in basic Number Theory in general to be able to solve this problem.
- A nice example was the case when $$C=D$$. This was an extreme case (i.e.: when $$D-C$$ was 0). By finding an algorithm that works here, we could easily generalize it to work for other small $$D-C$$.
- Most of the algorithms we came up with were pretty "intuitive". The way we solved this problem was by writing out these algorithms ("testing" them), and by finding out exactly where they fail. In particular, we learned that these algorithms largely depended on the parameter $$D-C$$, so then we could solve the problem by focusing on this variable.
- Instead of finding all multiples of a given number $$x$$, we focused on finding the factors of numbers $$y$$.

**The "Square-Root" Trick with Minimax Functions.**In general, whenever you have two symmetric, competing "resources", you usually can achieve optimality (either maximizing or minimizing some function of these resources) by setting them equal. When factorizing a number $$N$$ into factors $$x$$ and $$y$$ such that $$xy = N$$, this principle applies because $$\min(x,y)$$ is maximized whenever $$x = y = \sqrt{N}$$. This is also the worst-case amount of numbers you would need to check to find all the factors. Or for example, if you have a function $$F(x) = x + \frac{1}{x}$$, this is minimized when $$x = \frac{1}{x} = 1$$ (I think). This also forms the basis for "Square Root Decomposition" algorithms which arise in data structures (see the Related Problems section below).

- Number Theory / Divisibility / Factorization / Square Root Trick
- Greedy / Observations / Exploit Structure and Conditions
- Brute Force / Brute Force with Greedy Optimizations
- Math / Formulas and Proofs
- SQRT-Decomposition