I ran into a problem while trying to solve this problem.
We're given an array costs
of integers of length N
, K
an integer, and we need to return the minimum sum of K non-neighbor elements of costs.
1 <= N <= 1 000 000
1 <= K <= int(N/2)
1 <= costs[i] <= 1 000 0000 0000 for all 1 <= i <= N
The intuitive approach is simple enough: run a recursive function that gradually fills a 2D cache. Not all points need to be visited, but the time complexity is still O(n*k), which is too slow.
Another approach described in this post proposes a solution in O(k* log(n)), which seems to work for me in some cases, but not in all! An example is worth a thousand words, so first I'll give a situation in which the algorithm as I understand it works, then one in which it doesn't work.
N = 4
K = 2
cost = [2, 1, 2, 4]
//Expected result: 4 (2+2, any other option won't work)
We start with a sum of 0 and pair all costs[i]
with their index i
, then sort by cost, giving :
sum = 0
priority_queue = [[1, 1], [2, 0], [2, 2], [4, 3]]
Next, we remove the first element x from the list (here [1, 1]
), add x's cost (here 1
) to the sum, remove its two neighbors from the list (2
at index 0
and 2
at index 2
), then modify x's cost so that it is the sum of the costs of the two neighbors minus x's old cost (here 2 + 2 - 1 = 3
). We then place the new x in our table, keeping it sorted.
sum = 1
priority_queue = [[3, 1], [4, 3]]
We repeat the process a total of k times: here, the smallest element x is [3, 1]
, so we add 3
to the sum. x has no neighbors, so we delete it completely.
sum = 4 (1+3)
priority_queue = [[3, 1], [4, 3]]
So we get the right output!
Now let's look at an example that doesn't work.
N = 5
K = 3
cost = [2, 1, 5, 4, 7]
//Expected result: 14 (2 + 5 + 7, the only option here)
So, as before, initially :
sum = 0
priority_queue = [[1, 1], [2, 0], [4, 3], [5, 2], [7, 4]]
Then select [1,1]
, add 1
to sum
Its neighbors are [2, 0]
and [5, 2]
, remove them from the queue
Return [5+2-1, 1]
= [6, 1]
to the list, leaving it sorted.
sum = 1
priority_queue = [[4, 3], [6, 1], [7, 4]]
We start again with [4, 3]
and its only neighbor [7, 4]
(and this is where I think the mistake is).
sum = 5 (4+1)
priority_queue = [[3, 3], [6, 1]]
So on the last iteration:
sum = 8 (5 +3)
priority_queue = [[6, 1]] (no neighbors)
So we have 8
and not 14
as expected!
What's wrong with my handling of the special case where the element has only one neighbor? Or do I just misunderstand the overall algorithm?
-
2$\begingroup$ I fail to see how the algorithm (correct or not) could be in $\mathcal{O}(k\log n)$ when even the first step is in $\Omega(n\log n)$. $\endgroup$Nathaniel– Nathaniel2024年10月21日 20:07:08 +00:00Commented Oct 21, 2024 at 20:07
-
$\begingroup$ Yeah of course it's O(klog(n) if the array is already sorted, but nlog(n) otherwise $\endgroup$user176008– user1760082024年10月21日 20:21:44 +00:00Commented Oct 21, 2024 at 20:21
-
3$\begingroup$ At best it is $\Omega(n),ドル since you need to read each value of the array at least once. Also the array being sorted makes no sense in the general case, since the positions of the different values matter. $\endgroup$Nathaniel– Nathaniel2024年10月21日 23:55:37 +00:00Commented Oct 21, 2024 at 23:55
-
$\begingroup$ The run time is O(n + k log n)... $\endgroup$Neal Young– Neal Young2025年05月20日 17:56:22 +00:00Commented May 20 at 17:56
2 Answers 2
There is an O(n + k log n)-time algorithm for the problem
Before we present the algorithm, we give a lemma that captures the basic insight that the algorithm is based on. Fix a problem instance (k, C), where C is a collection of n elements with associated costs.
Assume k > 0 and let x be any min-cost element in C.
The solutions for (k, C) are the k-subsets of C that don't contain any two adjacent elements. Call a solution for (k, C) well behaved if either the solution contains x, or x has two neighbors and the solution contains them both.
Lemma 1. Some optimal solution is well behaved.
Proof. Fix any optimal solution S'. If S' contains x, or contains two neighbors of x, we are done, so assume otherwise.
Case 1. First consider the case that S' contains neither x nor any neighbor of x. Obtain S* from S' by replacing any element in S' by x. Then S* is a valid solution, and (by the choice of x) has cost at most the cost of S', so S* is also optimal.
Case 2. In the remaining case S' contains exactly one neighbor, say z, of x. Obtain S* from S' by replacing z by x. Again, S* is a valid solution, and has cost at most the cost of S', so is also optimal.
This proves Lemma 1. □しろいしかく
Algorithm
Here is the algorithm. It returns the minimum cost of any solution.
alg(k, C):
if k = 0, return 0
let x be an element with minimum cost in C
if x has at most one neighbor, let C' = C \ ({x} U nbrs(x)) be obtained from C by removing x and its neighbor (if any)
else (x has two neighbors), let C' be obtained from C by replacing x and its neighbors by a single new element y with cost equal to the total costs of the neighbors, minus the cost of x
return cost(x) + alg(k-1, C')
(This algorithm follows a suggestion in a comment on the page linked to by OP.)
Correctness
First we prove that the algorithm is correct, then we discuss run time.
Let x, y, and C' be as computed in Steps 2-4 of alg(k, C).
By Lemma 1, to find an optimal solution, it suffices to find a minimum-cost solution among well-behaved solutions.
Claim. The well-behaved solutions for (k, C) correspond in a one-to-one fashion to the solutions for (k-1, C'), so that, for each well-behaved solution S for (k, C), the corresponding solution S' for (k-1, C') satisfies cost(S) = cost(x) + cost(S').
Assuming the claim is true, it follows that the minimum cost of any well-behaved solution for (k, C) equals cost(x) plus the minimum cost of any solution for (k-1, C'). Using this, an easy induction on k shows that the algorithm is correct. That is, it returns the minimum cost of any solution for (k, C).
To complete the proof that the algorithm is correct, we prove the claim by verifying that the claimed correspondence exists.
Partition the well-behaved solutions S* for (k, C) into two disjoint types:
- those that contain x, and
- those that contain two neighbors of x.
Likewise, partition the solutions S' for (k-1, C') into two disjoint types:
- those that don't contain y, and
- those that contain y
Solutions S* for (k, C) that contain x will correspond to solutions S' for (k-1, C') that don't contain y, via the correspondence S* = S' U {x}. That is, one obtains S' from S* by removing x, and one obtains S* from S' by adding x back.
Solutions S* for (k, C) that contain two neighbors of x will correspond to solutions S' for (k-1, C') that contain y, via the correspondence S* = (S' \ {y}) U nbrs(x). That is, one obtains S' from S* by replacing the neighbors of x by y, and one obtains S* from S' by replacing y by the neighbors of x.
One can verify that this is indeed a one-to-one correspondence between the well-behaved solutions for (k, C) and the solutions for (k-1, C'). Finally, using cost(y) = cost(nbrs(x)) - cost(x), one can also verify that cost(S*) = cost(S') + cost(x) for any pair (S*, S') of corresponding solutions. This proves the claim, and the correctness of the algorithm.
Run time
The O(n + k log n)-time implementation works as follows. It maintains the collection C in a min-heap, keyed by cost. The top-level recursive call builds this heap in O(n)-time using the standard linear-time algorithm. Then, in each call with k > 0, the element x is obtained in O(log n) time by taking a minimum-cost element from the heap for C. The heap C' is then obtained from the heap for C in O(log n) time by deleting x and its two neighbors, and inserting the new element x'.
(The implementation also maintains a doubly-linked list of the elements in C, in order, so that neighbors of x can be found in constant time.)
EDIT (Python code):
from numbers import Number
from heapq import heapify, heappop, heappush
from typing import Self
def min_k_ind_set(k: int, C: list[Number]) -> Number:
assert k <= (len(C) + 1) / 2
class Elt:
def __init__(self, cost: Number):
self.cost = cost
self.deleted = False
self.prev: Self | None = None
self.succ: Self | None = None
def __lt__(self, other: Self) -> bool:
return self.cost < other.cost
def delete(self, replace_by=None) -> None:
self.deleted = True
if self.prev:
self.prev.succ = replace_by if replace_by else self.succ
if self.succ:
self.succ.prev = replace_by if replace_by else self.prev
if replace_by:
replace_by.prev = self.prev
replace_by.succ = self.succ
heap = [Elt(c) for c in C]
for e1, e2 in zip(heap, heap[1:]):
e1.succ = e2
e2.prev = e1
heapify(heap)
value = 0
while k > 0:
elt = heappop(heap)
if elt.deleted:
continue
k -= 1
value += elt.cost
if elt.prev and elt.succ:
replacement = Elt(elt.prev.cost + elt.succ.cost - elt.cost)
elt.delete(replace_by=replacement)
heappush(heap, replacement)
else:
elt.delete()
if elt.prev:
elt.prev.delete()
if elt.succ:
elt.succ.delete()
return value
def test(k, C):
value = min_k_ind_set(k, C)
print(f"{(k, C)} -> {value}")
return value
assert test(3, [1, 2, 3, 4, 5]) == 9
assert test(2, [1, 2, 3, 4, 5]) == 4
assert test(1, [1, 2, 3, 4, 5]) == 1
assert test(3, [3, 1, 4, 5, 9, 2, 6, 5]) == 8
assert test(2, [3, 1, 4, 5, 9, 2, 6, 5]) == 3
The link must hold when you delete, so when you remove [4, 3], the neighbors are [6, 1] and [7, 4]. Then you add [6+7-4, 3], ie [9, 3] to be alone and the last picked.
Your sum is then : 1 + 4 + 9, which is correct. That will make Joseph happy ;-)