This is my dynamic programming solution in Python to the problem of finding the cost of the optimal binary search tree:
def optBST(freqs):
"""C(i,j) = min(C(i,k) + C(k+1, j)) + sum(freqs[i:j])"""
# i references the rows in the 2D array, representing the starting element.
# j references the columns in the 2D array, representing the concluding element (exclusive).
# L represents the length considered, with values ∈ [0, len(freqs)].
# L = 0 (i.e., i = j) is already considered when the 2D array is initialized.
# For each length, consider each i that defines a unique set of [i;j) ∈ [0; len(freqs))
n = len(freqs)
arr = [[0 for _ in range(n + 1)] for _ in range(n + 1)]
for L in range(1, n + 1):
for i in range(n):
j = i + L
if j > n:
break
else:
arr[i][j] = min(arr[i][k] + arr[k + 1][j] for k in range(i, j)) + sum(freqs[i:j])
return arr[0][n]
keys = [10, 12, 20]
freqs = [34, 8, 50]
print(optBST(freqs))
For each $L$, there is a specific range of valid $i$ (the starting point), and $k$ will iterate $L$ times between $i$ and $j$.
So the number of steps taken for each L is:
$$(number\;of\;possible\;i) * L = [n - (L - 1)] * L$$
There is $n$ such $L$, so when I summed all that up and worked out the expression, I got the following total steps for the worst case: $$\frac{n(n+1)(n+2)}{6}$$, which is $O(n^3)$.
However, this seems to be a fairly slow process to figure out the time complexity of this function. I imagine there must be a faster (yet still rigorous) way.
$L$ iterates about $n$ times. $i$ also iterates about $n$ times. But $k$ only iterates $m$ times, with $m$ tending toward $n$ but not strictly $n$ all the times.
How to argue that this is still $O(n^3)$?
Thank you!
2 Answers 2
As you write, $L$ iterates over $O(n)$ many values, $i$ over $O(n)$ many values, and the minimum and sum in the inner loop also iterate over $O(n)$ many values. In total, we get $O(n) \cdot O(n) \cdot O(n) = O(n^3)$ many operations.
This calculation doesn't show that $O(n^3)$ is tight. For this, we notice that if $L$ and $i$ are in the range $(n/4,n/3)$ then we don't break out of the innermost loop, and furthermore $j - i = L = \Omega(n)$. Hence we have $\Omega(n)$ values of $L$ and $I$ for which the minimum and sum take time $\Omega(n)$, for a total of $\Omega(n) \cdot \Omega(n) \cdot \Omega(n) = \Omega(n^3)$ many operations.
This kind of matching lower bound becomes "obvious" with practice, and need not be spelled out explicitly.
-
$\begingroup$ Hi thanks for your response! Could you please explain why "if L and i are in the range (n/4,n/3), then we don't break out of the innermost loop"? I'm kind of lost on why you chose n/4 and n/3 specifically. $\endgroup$ten_to_tenth– ten_to_tenth2024年01月02日 04:37:45 +00:00Commented Jan 2, 2024 at 4:37
-
$\begingroup$ It's rather arbitrary. I could have chosen other values. What we need is for the lower bounds to be $\Omega(n),ドル and for the upper bounds to sum to at most $n$. $\endgroup$Yuval Filmus– Yuval Filmus2024年01月02日 21:56:25 +00:00Commented Jan 2, 2024 at 21:56
-
$\begingroup$ For example, if you wanted to calculate log n!, n! Is the product of n numbers from 1 to n. This product is obviously less than multiplying n, n times, giving log n^n. You also have n/ 3 numbers greater than 2n/3 so a lower bound is (2n/3)^(n/3). Or you have n/2 numbers greater than n/2. Or (n - sqrt n) numbers greater than sqrt n. $\endgroup$gnasher729– gnasher7292024年01月29日 16:19:40 +00:00Commented Jan 29, 2024 at 16:19
Big-O notation requires a bound number of steps <= c * f(n). So you can replace anything with a larger number. You can replace the sum of two Big-O's with the larger Big-O. Often you can find a bound if n = 2^k, and find that n' < 2^k cannot be larger, then you can take the bound for 2^k. For a nested loop, outer loop running O (f(n)) times, inner loop running for O (g(n)) times for each iteration of the outer loop, you get O (f(n) * g(n)).
That will answer 90% of the questions asked here.