Aladdin And The Return Journey

We are given a tree with $$N$$ nodes ($$1 \leq N \leq 30,000$$). On each node $$i$$ there is a number of "genies" $$v_i$$ ($$0 \leq v_i \leq 1000$$). For a pair of nodes $$i$$ and $$j$$, let $$sum(i,j)$$ denote the total number of genies on all nodes on the path from $$i$$ to $$j$$ (inclusive).

We would like to process a series of queries (at most $$100,000$$ queries will be given). There are two types of queries:

  1. "Update $$i,v$$": Set $$v_i$$ (the number of genies on node $$i$$) to the new value $$v$$
  2. "Count $$i,j$$": Print the total number of genies on all nodes on the path from node $$i$$ to node $$j$$ (inclusive). That is, print $$sum(i,j)$$
For each of the second type of query, print the result.
  1. Tree Problem
  2. Make into a rooted tree. Pick an arbitrary root.
  3. Queries ==> data structures
  4. Paths on a tree.
  5. Lowest Common Ancestor (LCA) Data Structure
  6. Heavy-Light Decomposition
  7. Fenwick Trees (Binary Indexed Trees)
  8. Fenwick Tree + Heavy-Light Decomposition
  9. DFS Ordering (Pre-Order Traversal) Array
This section describes a great deal of experience and intuition needed to correctly model the problem. If you have never heard of "Lowest Common Ancestor" before then you should read this section in detail first. A more advanced reader should at least view the Key Observations and Lemmas of this section.
The first rule of dealing with trees: Pick an Arbitrary Root. It turns out that dealing with rooted trees is much easier than dealing with un-rooted trees. This is true because rooted trees admit a recursive "optimal sub-structure". In simple words: Every rooted tree is made up of smaller rooted sub-trees, so solving problems on an entire tree can often be solved recursively or with dynamic programming. Knowing this makes many tree problems a lot easier to handle. We will apply this here. (NOTE: Be careful. There may be many problems where this "rule" doesn't apply.)
The above "rule" about trees is often a good one to keep in mind! Let's consider it a learning point for the reader!

Suppose we pick an arbitrary node to be the root of the tree (without loss of generality, let's choose node 0). Now, consider a pair of nodes $$i$$ and $$j$$. Then there is some node $$c$$ such that the path from $$i$$ to $$j$$ is the concatenation of the path from $$i$$ to $$c$$ and from $$c$$ to $$j$$, where $$c$$ is an "ancestor" of both $$i$$ and $$j$$ (according to the rooted tree).

Such a node $$c$$ is called the "lowest common ancestor" (or LCA) of $$i$$ and $$j$$.

We will use $$lca(i,j)$$ to denote the ancestor of $$i$$ and $$j$$ whose depth in the tree is maximum. This corresponds to the node $$c$$ as described in the previous Key Observation.

Many problems on trees -- such as the current problem -- relate to "paths" in trees. Whenever we have such a problem on a rooted-tree, it is often beneficial to break down paths into "simple" paths from nodes to ancestors only (i.e.: paths that only travel directly down the tree, or directly up the tree, but do not otherwise "bend".)

These kinds of simpler paths are useful mostly because there are many data structures and techniques that exploit the "parent-child" relationship or the "ancestor-descendant" relationship. For example, I know of at least two techniques that allows one to efficiently "update all descendants of a node" or "update all ancestors of a node" (I will actually describe them below because they can be used to solve this problem). Lastly, there is a very efficient data structure to compute the "Lowest Common Ancestor" of any two nodes of a rooted tree (Google "Lowest Common Ancestor" and you should find out about it; I believe this was invented by Robert Tarjan!) And it is not too hard to see that the Key Observation from above is true for all rooted trees: every path from a node $$i$$ to node $$j$$ can be broken down into a path from $$i$$ to $$c$$ and a path from $$c$$ to $$j$$, where $$c = lca(i,j)$$.

Altogether, the existence of these techniques and data structures makes it highly desirable to convert our original definition of a path into equivalent definitions using the LCA.

This is something you learn from experience. However, let's call it another learning point!

There is a classic data structure that enables you to efficiently compute $$c := lca(i,j)$$ for any pair of nodes $$i,j$$. And any path from $$i$$ to $$j$$ can be broken down into paths from $$i$$ to $$c$$ and $$c$$ to $$j$$. So, most query problems that deal with paths on a trees can be solved by reducing the problem in a way that exploits the LCA data structure.
For any pair of nodes $$i$$ and $$j$$, let $$c := lca(i,j)$$ be the lowest common ancestor of $$i$$ and $$j$$. Then: \[sum(i,j) = sum(i,c) + sum(c,j) - v_c\]
This basically follows from the key observation above. We break the path into a path from $$i$$ to $$c$$ and from $$c$$ to $$j$$. We have to subtract $$v_c$$ (the number of genies on node $$c$$) to avoid double-counting.

Let's take this idea one step further. But first, some intuition.

Suppose we were only given an array of genies (rather than a tree). How would we solve our original problem?

Well, what we could do is compute the "Prefix Sum" array. That is, if we have an array $$A[0],A[1],...,A[N-1]$$ and we want to be able to quickly compute the sum $$A[i] + A[i+1] + ... + A[j]$$ for some pair $$i,j$$, the we can create a new array $$C$$ defined so that: \[C[0] = A[0]\] \[C[1] = A[0] + A[1]\] \[C[2] = A[0] + A[1] + A[2]\] \[...\] \[C[N-1] = A[0] + A[1] + A[2] + ... + A[N-1]\] $$C$$ is called the "Prefix Sum" array. Then, to compute any sum $$A[i] + A[i+1] + ... + A[j]$$, we simply take: \[C[j] - C[i-1]\] Although there are a couple of corner cases to handle, this works in general, and gives us an $$O(1)$$-time method to quickly find the sum of any contiguous sub-array of the original array $$A$$. In particular, we pre-compute the answers for queries $$A[0],...,A[i]$$ and we answer the questions for $$A[i],...,A[j]$$ by subtracting particular entries of the pre-computed data.

In the case of our given problem, we are not given an array, but we are given a tree. And we are not concerned about summing contiguous sub-arrays, but we instead want to find the sum of numbers along a path. From our earlier arguments, we only have to worry about paths of the form $$i,c$$ where $$c$$ is an ancestor of $$i$$. Now, is there any way to "generalize" the "Prefix Sum" array idea from arrays to trees like this?

In this case, the answer is YES! We have the following Key Observation.

Let $$i$$ and $$c$$ be nodes where $$c$$ is an ancestor of $$i$$. Then: \[sum(i,c) = sum(i,0) - sum(c,0) + v_c\]
If we know the answer from $$i$$ to $$0$$ and from $$c$$ to $$0$$, then the total amount of genies from $$i$$ to $$c$$ ($$sum(i,c)$$) must be the total amount of genies from $$i$$ to $$0$$, minus the number of genies on the path from $$c$$ to $$0$$ (excluding the node $$c$$ itself). This yields the formula.

The above lemma basically says that, if we can quickly compute $$sum(i,0)$$ for all nodes $$i$$, then we can quickly compute $$sum(i,c)$$ for any node $$i$$ and ancestor $$c$$. This is a tree-like analogue to using Prefix Sums for arrays.

Is this helpful? We shall see in the next Idea Section how to construct a data structure that easily allows us to compute the total amount of genies from a node $$i$$ to the root $$0$$ (i.e.: $$sum(i,0)$$). By using this with the ideas above, we can therefore compute $$sum(i,c)$$ for any node $$i$$ and ancestor $$c$$. Lastly, by combining this with the LCA data structure, this will allow us to compute $$sum(i,j)$$ for any arbitrary pair of nodes $$i$$ and $$j$$.
This Idea Section describes an advanced technique needed to solve this problem. If you are Intermediate or Advanced reader, but you were unable to solve this problem, then this section is crucial. If you are a beginner, please read the previous section first.

In the last Idea Section, we saw how to model this problem using the Lowest Common Ancestor (LCA). In particular, to answer the query $$sum(i,j)$$ it suffices to answer the queries $$sum(i,c)$$ and $$sum(c,j)$$ where $$c$$ is the LCA of $$i$$ and $$j$$. We took this one step further and noted that, to answer $$sum(i,c)$$, it suffices to know $$sum(i,0)$$ and $$sum(c,0)$$.

In this section, we show how to efficiently answer the latter question: How do we quickly answer queries of the form $$sum(x,0)$$ for any node $$x$$.

First, notice that, if the problem were "static" (i.e.: the numbers were not changing), we could probably solve this by performing a depth first search (dfs) at the beginning, and pre-computing the answers. In particular, at any point in the dfs, if we are at node $$i$$, we would keep track of the sum of the number of genies along the path from the root (node $$0$$) to node $$i$$. When we traverse another edge, we add $$v_i$$ to the sum, and when we "backtrack" out of the edge, we subtract $$v_i$$. This traversal would maintain the correct number at each node.

Although this problem is not static, we can still draw a great deal of intuition from this special case. In particular, we can still (theoretically) perform this dfs at the beginning. Then, when an "Update" query comes, we notice that we only have to update the descendants of the changing node.

(See the Related Problems section at the bottom of this page.)

We formalize this with the following key observation.

Suppose we were to pre-compute the answer $$sum(i,0)$$ for each node $$i$$. When an "Update $$i,v$$" query is given (to change the number of genies on node $$i$$ to become value $$v$$), we notice that $$sum(x,0)$$ will only change for those nodes $$x$$ who are descendants of $$i$$. All other nodes will have their sum value stay the same. Moreover, the value of those $$x$$ that are children of $$i$$ changes uniformly (i.e.: by the same amount) -- exactly by the amount: $$v - v_i$$ (i.e.: the new value of node $$i$$, minus the old value). Note, the term "descendant" also includes $$i$$ itself (that is, $$i$$ is a descendant of itself, technically).

Based on the above observation, we have simplified the problem to the following question: "How can we efficiently increment a value for all children/descendants of a node $$i$$?"
If we can quickly answer this above question, we are pretty much done. Particularly, we store the values $$sum(x,0)$$ for all nodes $$x$$ in the tree. Then, whenever an "Update $$i,v$$" query comes in, we simply update (increment or decrement) the "stored values" for $$i$$ and all its descendants by the amount $$v_i - v$$ in order to maintain the correct answer. Whenever a "Count $$i,j$$" query comes in, we find $$c := lca(i,j)$$, and our answer will be $$sum(i,c) + sum(j,c) - v_c$$, which is altogether equal to: \[\left(sum(i,0) - sum(c,0) + v_c\right) + \left(sum(j,0) - sum(c,0) + v_c\right) - v_c\] since $$sum(i,c) = sum(i,0) - sum(c,0) + v_c$$, and $$sum(j,c)$$ is computed similarly. Notice that this latter formula only requires knowing $$sum(i,0), sum(j,0),$$ and $$sum(c,0)$$, which are all "stored" values.
So how do we answer this question?
There is a "trick" that can be used to quickly "update (increment/decrement) all descendants" of a node in a tree. First, the reader must be familiar with "Binary Indexed Trees" (also known as "Fenwick Trees"). The reader should also be fairly comfortable with tree-traversals: for example, you should know about Depth-First-Search (dfs) and Pre-Order-Traversal.
Let $$A$$ be the array defined by the "Pre-Order-Traversal" of the tree. In particular, $$A$$ is defined recursively. $$A[0] = 0$$ (the root node). Then, it is followed by the Pre-Order-Traversal arrays of the sub-trees of node 0. It can be constructed by running a particular depth-first-search on the tree. (Note: This is fairly hard to describe. If the reader does not understand the concept of "Pre-Order-Traversal" array, please Google it.)
Consider a particular node $$i$$. In the Pre-Order-Traversal array, the node $$i$$ and all of its descendants will appear consecutively, in a contiguous sub-array of $$A$$.
Omitted. But it can be easily proven by induction on the sub-trees, or by inspecting the algorithm used to construct the Pre-Order-Traversal array.
This lemma is very important, in particular, we notice that we can "update" the stored value for all descendants of $$i$$ if we know how to update the values in contiguous sub-arrays. This gives us the following key observation.

Suppose we have the array $$A$$ representing the Pre-Order-Traversal array. Let $$B$$ be another array so that $$B[k]$$ denotes the current known value of $$sum(A[k],0)$$. That is, let $$B[k]$$ denote the current sum of all genies from the $$k^{th}$$ node of the Pre-Order-Traversal, up to the root of the tree. Then, originally, we can compute $$B[k]$$ (by some kind of depth first search maybe?). And whenever an "Update $$i,v$$" query comes in, we only need to update a contiguous range of the $$B$$ array.

We can use Fenwick Trees to update a contiguous range of an array. By constructing the Fenwick Tree for the array $$B$$, and by knowing which node corresponds to which index (according to the Pre-Order-Traversal array $$A$$), this allows us to quickly answer the question we were interested in!!!

We now have enough information to solve the entire problem. The reader may be a bit confused because we made several simplifications! Here we present the final complete algorithm to solve this problem.

We are given a tree $$T$$, an initial value $$v_i$$ for each node $$i$$, and a series of queries of the forms: "Update $$i,v$$" and "Count $$i,v$$" (as described in the problem statement). Here is the final pseudocode:

# Pre-define some functions
	Treat T as a rooted tree, and let node 0 be the root
	Construct the LCA (lowest common ancestor) data structure on T

# PreOrderTraversal(i,A) populates the array A with the
#  pre-order-traversal for the sub-tree rooted in node i
def PreOrderTraversal(i,A):
	for each child j of i:
# lca(i,j) returns the lowest common ancestor of i and j
def lca(i,j):

# If B is an array, update(B,i,j,v) increments B[i..j] by value v (using a Fenwick Tree)
def update(B,i,j,v);

	let N = |T| 		#(the number of nodes in T)
	let A[0..N-1] be an array
	let B[0..N-1] be an array
	# initialization
	call Initialize(T)
	call PreOrderTraversal(0,A)
	for all i = 0..N-1
		Set B[i] = v[i]
	for each query:
		if query is type "Update i,new_v":
			Let L be the index (integer) so that A[L] = i (the index corresponding to node i in A)
			Let R be the largest index such that A[R] is a descendant of i
			# Note: A[L..R] now precisely contains the sub-tree rooted in i
			Let v_change := v[i] - new_v
			update(B,L,R, v_change)
			Set v[i] = new_v
		else (query is type "Count i,j"):
			Let c := lca(i,j)
			Let sum_ic = B[i] - B[c] + v[c]		# sum(i,c) = sum(i,0) - sum(i,c) + v[c]
			Let sum_jc = B[j] - B[c] + v[c]		# sum(j,c) = ... (similarly)
			Let ans = sum_ic + sum_jc - v[c]	# sum(i,j) = sum(i,c) + sum(j,c) - v[c]
			Print (ans)
Calling $$solve(T,v)$$ should solve the problem.
An implementation detail: the $$B$$ array will probably not be stored "explicitly". We can use data structures such as the Fenwick Tree to "simulate" such an array. (See the Related Problems section at the bottom of this page to see how to implement a "dynamic array")

Just a quick note on the time/space complexity: all of the data structures use $$O(N)$$ memory altogether (maybe $$O(N \lg N)$$ at worst, depending on how you implement the $$B$$ array / data structure). Each query takes $$O(\lg(N))$$ time in total: $$O(\lg N)$$ to find $$lca(i,j)$$ (if necessary), and $$O(\lg N)$$ to update or access the $$B$$ array (assuming $$B$$ is implemented with a Fenwick Tree).

Hence, the overall algorithm takes $$O((N+Q)\lg N)$$ where $$N$$ is the number of nodes and $$Q$$ is the number of queries.

Transforming the "tree" into an "array" was a neat trick. In particular, we used the "Pre-Order-Traversal" of the tree to define an ordering of the nodes such that all descendants of a node $$i$$ appear contiguously in the array. This made it really easy for us to process queries of the form: "Update all descendants of $$i$$". This is a trick that can often be used whenever we have queries of this form!
All in all, this gives us our final algorithm, and the solution to the problem!!

This is a beautiful illustration of many Data Structures on Trees. To solve this problem, you need knowledge of:

  • The Lowest Common Ancestor data structure
  • Prefix Sums and Range Queries
  • Pre-Order-Traversal
  • Fenwick Trees

These ideas allow you to (eventually) reduce the problem into a "range query" problem on a simple array. There are quite a few non-trivial "reductions" (problem transformations) needed to get there, but in the end, it is quite a neat solution!

First, we assume the tree is rooted, and that node $$0$$ is the root of the tree. This makes the problem easier to conceptualize and solve (because there are many data structures and techniques to solve problems on rooted trees).

The next "reduction" is to notice that, for rooted trees, the query $$sum(i,j)$$ (finding the number of genies on the path from $$i$$ to $$j$$) can be reduced to finding the answer to queries $$sum(i,c)$$ and $$sum(j,c)$$, where $$c$$ is an "ancestor" of both $$i$$ and $$j$$ (it is actually the "Lowest Common Ancestor" or $$lca(i,j)$$). These types of queries are easier to handle (because they are always concerning parents and children, or ancestors and descendants, rather than arbitrary nodes / paths). And the well-known LCA data structure makes it a routine task to find $$c$$ whenever we need to!

The precise formula is: \[sum(i,j) = sum(i,c) + sum(c,j) - v_c\] (where $$v_c$$ is the current number of genies on node $$c$$). The last subtraction of $$v_c$$ is to avoid double-counting it.

Next, we notice that queries of the form $$sum(i,c)$$ (where $$c$$ is an ancestor of $$i$$) can be reduced to finding the queries $$sum(i,0)$$ and $$sum(c,0)$$, where $$0$$ is the root of the tree. There are only $$O(N)$$ queries of the form $$sum(i,0)$$, so this is much more manageable.

The precise formula is:

\[sum(i,c) = sum(i,0) - sum(c,0) + v_c\]

So, putting together the above two formulas, we can see that all queries $$sum(i,j)$$ can be solved by: \[sum(i,j) = sum(i,0) + sum(j,0) - 2sum(c,0) + v_c\] (where this formula only depends on queries from nodes to the root.)

So we are only concerned with answering queries of the form $$sum(i,0)$$. Now, let's pretend we already know $$sum(i,0)$$ for all nodes $$i$$. If an "update" query comes on some node $$i$$ for some value $$v$$, which of the answers will change? Well, for a node $$x$$, if $$x$$ is a child or descendant of node $$i$$, then $$sum(x,0)$$ will change (because $$i$$ is on the path from $$x$$ to $$0$$). And if $$x$$ is NOT a descendant of $$i$$, then $$sum(x,0)$$ will NOT change. So, when an update query comes, we want to update precisely all of the nodes in the sub-tree of $$i$$. And when a "count" query comes, we just want to lookup the answer for the given node.

We can actually do this fairly efficiently. Consider the "Pre-Order-Traversal" array $$A$$ of the tree. That is, $$A$$ is a "permutation" of the numbers $$0$$ through $$N-1$$ defined recursively by a depth-first-search.

The key observation about the array $$A$$ is that, for a given node $$i$$, all of its descendants will appear consecutively (next to each other) in this array. So, when an "update" query comes in, since we would like to update all descendants of the node $$i$$, it is equivalent to updating all the nodes in a contiguous sub-array of $$A$$. So we do precisely that. We construct another array $$B[ ]$$ so that $$B[k] := sum(A[k],0)$$ (the number of genies on the path from node $$A[k]$$ to node 0). Whenever we want to update some node $$i$$, we find its index in $$A$$, and we find the indices of all its children in $$A$$. This defines a range $$L,R$$ of indices. We then increment/decrement the corresponding values in the $$B$$ array: $$B[L..R]$$ with a range update.

Lastly, in order to efficiently perform these range updates and range queries on the $$B$$ array, we must avoid "explicitly" storing the data. Instead, we construct a Fenwick Tree to represent $$B$$. A Fenwick Tree can support range updates and single-point queries (which is precisely what we need) in $$O(\lg N)$$ time. This is sufficient.

As long as we can maintain the $$B[ ]$$ array with a Fenwick Tree, we can answer a query $$sum(i,j)$$ using the formula above: \[sum(i,j) = B[i] + B[j] - 2*B[c] + v_c\] where $$c := lca(i,j)$$ and $$B[ ]$$ can be computed from the Fenwick Tree. This yields an overall $$O(N \lg N)$$ solution.

  1. Assume the tree is rooted at node $$0$$. Without loss of generality.
  2. For a pair of nodes $$i$$ and $$j$$, let $$c$$ denote $$lca(i,j)$$. Then $$sum(i,j) = sum(i,c) + sum(c,j) - v_c$$. This shows that we can reduce queries on arbitrary paths to queries on "ancestor-descendant" paths.
  3. For a pair of nodes $$i$$ and $$c$$ (where $$c$$ is an ancestor of $$i$$). Then $$sum(i,c) = sum(i,0) - sum(c,0) + v_c$$. This shows that we can reduce all queries into those dealing with the root node.
  4. Suppose we somehow "store" $$sum(i,0)$$ for all nodes $$i$$. When an "update" query comes for a specific node, we have to update exactly those nodes that are descendants of the changing node. Moreover, we increment/decrement all of these nodes by the same amount. This can be seen by inspection. If $$v_i$$ is changing, then $$sum(x,0)$$ changes for some node $$x$$ if and only if $$x$$ is in the sub-tree of $$i$$.
  5. Consider the Pre-Order-Traversal array of the tree. For any node $$i$$, all the descendants of $$i$$ (including $$i$$ itself) will appear consecutively in this array. This makes the problem of "updating a sub-tree" into the problem of "updating a contiguous sub-array". The latter problem is much easier to solve.
  6. Suppose we "relabel" the nodes according to their Pre-Order-Traversal, and we construct an array $$B$$ so that $$B[i] = sum(i,0)$$. Then to "update" a node, we simply have to update a contiguous range in $$B$$. This can be done with a Fenwick Tree. This was the last step in the algorithm. Above we do not "relabel" the nodes, but instead of use new labelling (according to the Pre-Order-Traversal) as indices into the $$B$$ array (these are equivalent definitions).
  7. The entire problem can be solved in $$O(\lg N)$$ time per query.
  1. This problem requires a lot of prior knowledge. We needed to recognize that we should transform the tree into a rooted tree. We need to know about the Lowest Common Ancestor, Fenwick Trees, Prefix Sums, and Pre-Order-Traversal Array -- all non-trivial data structures. This was not an easy problem to solve without knowing about all of these things.
  2. Turning the tree into a rooted tree is an example of a problem transformation.
  3. Arbitrary queries $$sum(i,j)$$ were too hard to solve. Instead we repeatedly simplified our queries. By using the LCA of $$i$$ and $$j$$ (denoted by $$c$$), we could simplify $$sum(i,j)$$ into queries of the form $$sum(i,c)$$ and $$sum(j,c)$$. Taking this one step further, we were able to simplify all the queries into forms $$sum(i,0)$$, $$sum(j,0)$$ and $$sum(c,0)$$. This made our task much easier to solve.
  4. Although not explicitly stated above, it is helpful to clearly write down the formulas that we get after each simplification. This helps to avoid bugs later on!
  • The First Rule of Tree Problems: Pick an arbitrary root! This appears very often because "rooted" trees have much more structure and symmetry than un-rooted trees. Often-times the root doesn't matter. Sometimes the root matters, or sometimes you have to "try" all nodes as the root.
  • Whenever you encounter problems that relate to "paths on a rooted tree", it is very likely that the LCA data structure will be useful. Why? Because every path from $$i$$ to $$j$$ can be partitioned into two paths, one from $$i$$ to $$c$$ and one from $$c$$ to $$j$$ where $$c$$ is the LCA of $$i$$ and $$j$$. And these kinds of paths are easier to deal with.
  • Whenever you encounter problems that relate to querying/updating "sub-trees of a rooted tree", it might be possible that the Pre-Order-Traversal Array will be useful! Why? Because the Pre-Order-Traversal Array has the property that all nodes in a given sub-tree occur consecutively in the array. This makes it easier to perform range-updates, range-queries, and even "splitting" and "merging" sub-trees (i.e.: if we treat them like linked lists)! Arrays are much easier to deal with than trees (see some of the Related Problems below).
  • Trees / Querying Paths on a Tree / Lowest Common Ancestor (LCA)
  • Trees / Querying Sub-Trees of a Tree / DFS Ordering / Pre-Order-Traversal
  • Data Structures / Range Queries / Fenwick Trees and Segment Trees
  • Prefix Sums / Prefix Sums on a Tree / Sums of Ancestor Paths
  • Range Queries on a Tree / Heavy-Light Decomposition (HLD)