Finding sum of sub vector satisfying inequality

Question 1

I am asked to implement code that checks if there is a sub vector that has a sum, v_l, that would satisfy \$ \frac{1}{4}mn + 1 \leq v_l \leq v - \frac{1}{4} m n - 1\$.

def possible(vec, m):
 n = len(vec) # Number of columns
 v = sum(vec)
 global R
 R = []
 num = {i: 0 for i in set(vec)}
 for i in vec:
 num[i] += 1
 solution = dfs_backtrack(num, m, n, v) 
return solution
def dfs_backtrack(vec, m, n, v):
 v_l = sum([i*vec[i] for i in vec.keys()])
 R.append(vec)
 if 0.25 * m * n + 1 <= v_l <= v - 0.25 * m * n - 1:
 return vec
 for i in vec.keys():
 temp = dict(vec)
 temp[i] -= 1
 if temp[i] is 0:
 temp.pop(i)
 if temp and temp not in R :
 temp = dfs_backtrack(temp, m, n, v)
 else:
 temp = None
 if temp:
 return temp
 return None

From my analysis it would have a space and time complexity of \$O(n^{m+1} m^{-m-1})\,ドル where \1ドル \leq m \leq n\$. What can I do to improve the space and time complexity?

Question 2

You have a lot of if statements. I think vectorizing using numpy might help speed things up. If you store the variables (like vec) as an array, you can use numpy where to find the indices at which the values are bounded within the desired range. Also, numpy unique (with kwarg return_counts) can be an alternative approach to your for loop iteration. But I think the best approach is determing the m and n bounds before computing all sums brute force.

Question 3

I am not able to use other libraries unfortunately. if i determine the m and n bounds before computing all the sums, how can i do it without computing the sum of the sub-arrays?

Question 4

You would have to compute some sums but would not have to check all sums. As an example, m * n <= 4 * (v_l - 1) from your inequality; this won’t work if m or n is too large, implying you can restrict the values you check. ...unless I am misunderstanding something

Question 5

that makes sense. So I exit the the recursion earlier. That would help. But the worst complexity would remain the same and the average case is lower?

Question 6

Yes. Applying the condition to the upper bound would be trickier.

Question 7

Since the sum v_l changes with each iteration through your dictionary values of num, I don't know of a way to pre-compute the right-side sums before-hand. This would be a little easier if you could use external modules. That said, you might see a slight speed-up if you use more comprehensions. For example, you can get num via num = {x : vec.count(x) for x in vec}. Also, use == 0 (not is 0) to check zero equality and use is None(not == None) to check None.

Also, you are iterating through all possible combinations. In the case of using votes1 and m1, you know that m = m1 = 10 and n = len(votes1) = 30. So you know that your lower bound is cut-off at 76. Those are 76 permutations that you do not need to use. You can include a break statement at the lower bound since you are incrementing downward (temp[i] -= 1), but I would instead use this lower bound as the starting point and iterate by incrementing upward as you may find some other way to restrict the sums in the upper limit.

Lastly, the use of globals and non-descriptive variable names makes it hard to edit the code. For example, votes instead of vec, ndict instead of n, etc. Also, you can pass R into dfs_backtrack.

I personally prefer iterating over lists instead of dictionaries since you can use zip. I don't have a full solution for your problem, but it may help as a start.

def get_lower_bound(m, n):
 return m*n/4 + 1
def get_upper_bound(m, n, v):
 return v - m*n/4 - 1
def initialize_constants(m, n, votes):
 """ Pre-compute these values as they do not change."""
 size = len(votes)
 total = sum(votes)
 lower_bound = get_lower_bound(m, n=30)
 return size, total, lower_bound
def initialize_vl(votes, m):
 ## I think set look-ups are quickest, these provide unique values just like dictionaries do
 unique_votes = list(set(votes)) #keys for num
 vcounts = [votes.count(x) for x in unique_votes] #vals for num
 vl = sum([a*b for a,b in zip(unique_votes, vcounts)])
 return unique_votes, vcounts, vl

This way, you don't need to use

 temp = dict(vec)
 temp[i] -= 1
 if temp[i] is 0:
 temp.pop(i)
 if temp and temp not in R :

as you are creating temp as a copy of vec; any change to temp changes vec as well. Also, you can iterate through your unique values in list(set(...)) instead of checking if each element is in R.

Sorry I could not be of more help. Let me know if anything is unclear.

Question 8

erm.. actually removing R would increase the complexity. some array that have already been computed would be computed again. i.e.[5,5,5,6] and [5,5,6,6] would both compute [5,5,6] unless I'm not calling them recursively.

Question 9

Taking a bit of a stab at this, I would suggest some of the following changes:

Use collections.Counter to perform your counting. Minor tweak, but it's in the standard library and it's fast.
Use a shared argument rather than a global variable to track which recursions you've already computed.
Pre-compute the lower and upper bounds. They don't change ever, and it cleans up the recursed function a bit.
Since you know which item you're removing when you recurse, subtract it from a running sum rather than re-sum the vector at each recursion.
Instead of removing one item from your vector for each recursion, since you've already bothered to group and count the unique values, go ahead and just recurse through each unique value for each recursion, and use a simple for loop to handle all possible quantities of that value that you include/remove.
Converting from a standard return-based function to a generator allows you to use yield from to more cleanly handle the case where "I didn't find the answer, but one of my children might have". In this case, I actually loop on the recursion rather than yield from it, but it has the same effect. It also means that, for free, the same function could give me either all valid answers or just the first valid answer it encounters.
Consider using a heuristic when you recurse across your various options in order to favor branches that are more likely to lead you to the correct answer.

The final code I ended up with after these suggestions was as follows:

from collections import Counter
import warnings
def possible(vec, m):
 n = len(vec) # Number of columns
 v = sum(vec)
 lower_bound = 0.25 * m * n + 1
 upper_bound = v - 0.25 * m * n - 1
 # Check for simple failure cases
 if sum(num for num in vec if num > 0) < lower_bound or sum(num for num in vec if num < 0) > upper_bound:
 return None
 # Use the high-performance, built-in counter class
 counts = Counter(vec)
 # Use a generator syntax rather than direct return
 # Some benefits:
 # 1. You avoid having to check if a value gets returned (temp variables), just "yield from" when recursing and "yield" when successful
 # 2. You automatically end up with a function that could produce _all_ valid answers, if desired, for free
 # 3. You end up with predictable errors if no answer is found
 # Use a shared mutable variable for R within a call stack, not a global
 # R is now a set to allow for O(1) "contains" checks
 valid_subvectors = dfs_backtrack(set(), counts, v, lower_bound, upper_bound)
 try:
 return next(valid_subvectors)
 except StopIteration:
 return None
 except RecursionError:
 warnings.warn("Unable to find subvector, it's too long/diverse for recursive solver")
 return None
def dfs_backtrack(R, vec, cur_sum, lower_bound, upper_bound):
 # Only need to check keys since, for each call we handle all possible numbers of a single key
 checked_this_already = tuple(sorted(vec.keys()))
 if checked_this_already not in R:
 # Since R is now a shared instance of a list, it acts "global" but only exists within the current call stack
 R.add(checked_this_already)
 # Summation is now performed by subtracting recursively from total, rather than re-summing each time
 if lower_bound <= cur_sum <= upper_bound:
 # Convert counter back into vector form
 yield list(vec.elements())
 # Add a simple heuristic to try to find the right answer faster
 def heuristic(k):
 return abs((lower_bound + upper_bound) // 2 - (cur_sum - k))
 # Pre-limit the keys we'll consider so we don't even run the heuristic on irrelevant keys (since 0's are no longer popped)
 potential_keys = [i for i in vec.keys() if i > 0]
 for i in sorted(potential_keys, key=heuristic):
 # Handle all possible numbers of that key at once to reduce recursion
 num_keys = vec.pop(i)
 for n in range(1, num_keys + 1):
 for solution in dfs_backtrack(R, vec, cur_sum - i * n, lower_bound, upper_bound):
 yield [i] * (num_keys - n) + solution
 vec[i] = num_keys

With these changes, for several sample vectors, I'm getting about a 3-5x speedup. Also, the change to the recursion patter makes it capable of handling much longer vectors so long as there are a limited number of unique values in them (rather than the original approach, which hits a recursion limit based on the raw length of the vector). This could be made better by avoiding recursion altogether, but that seemed too drastic a change to the code for this setting.

user127168user127168 · Answer 1 · 2018-04-02 09:33:19Z

Since the sum v_l changes with each iteration through your dictionary values of num, I don't know of a way to pre-compute the right-side sums before-hand. This would be a little easier if you could use external modules. That said, you might see a slight speed-up if you use more comprehensions. For example, you can get num via num = {x : vec.count(x) for x in vec}. Also, use == 0 (not is 0) to check zero equality and use is None(not == None) to check None.

Also, you are iterating through all possible combinations. In the case of using votes1 and m1, you know that m = m1 = 10 and n = len(votes1) = 30. So you know that your lower bound is cut-off at 76. Those are 76 permutations that you do not need to use. You can include a break statement at the lower bound since you are incrementing downward (temp[i] -= 1), but I would instead use this lower bound as the starting point and iterate by incrementing upward as you may find some other way to restrict the sums in the upper limit.

Lastly, the use of globals and non-descriptive variable names makes it hard to edit the code. For example, votes instead of vec, ndict instead of n, etc. Also, you can pass R into dfs_backtrack.

I personally prefer iterating over lists instead of dictionaries since you can use zip. I don't have a full solution for your problem, but it may help as a start.

def get_lower_bound(m, n):
 return m*n/4 + 1
def get_upper_bound(m, n, v):
 return v - m*n/4 - 1
def initialize_constants(m, n, votes):
 """ Pre-compute these values as they do not change."""
 size = len(votes)
 total = sum(votes)
 lower_bound = get_lower_bound(m, n=30)
 return size, total, lower_bound
def initialize_vl(votes, m):
 ## I think set look-ups are quickest, these provide unique values just like dictionaries do
 unique_votes = list(set(votes)) #keys for num
 vcounts = [votes.count(x) for x in unique_votes] #vals for num
 vl = sum([a*b for a,b in zip(unique_votes, vcounts)])
 return unique_votes, vcounts, vl

This way, you don't need to use

 temp = dict(vec)
 temp[i] -= 1
 if temp[i] is 0:
 temp.pop(i)
 if temp and temp not in R :

as you are creating temp as a copy of vec; any change to temp changes vec as well. Also, you can iterate through your unique values in list(set(...)) instead of checking if each element is in R.

Sorry I could not be of more help. Let me know if anything is unclear.

erm.. actually removing R would increase the complexity. some array that have already been computed would be computed again. i.e.[5,5,5,6] and [5,5,6,6] would both compute [5,5,6] unless I'm not calling them recursively.

scnerd scnerd 2,0607 silver badges12 bronze badges · Answer 2 · 2018-05-02 16:41:07Z

Taking a bit of a stab at this, I would suggest some of the following changes:

Use collections.Counter to perform your counting. Minor tweak, but it's in the standard library and it's fast.
Use a shared argument rather than a global variable to track which recursions you've already computed.
Pre-compute the lower and upper bounds. They don't change ever, and it cleans up the recursed function a bit.
Since you know which item you're removing when you recurse, subtract it from a running sum rather than re-sum the vector at each recursion.
Instead of removing one item from your vector for each recursion, since you've already bothered to group and count the unique values, go ahead and just recurse through each unique value for each recursion, and use a simple for loop to handle all possible quantities of that value that you include/remove.
Converting from a standard return-based function to a generator allows you to use yield from to more cleanly handle the case where "I didn't find the answer, but one of my children might have". In this case, I actually loop on the recursion rather than yield from it, but it has the same effect. It also means that, for free, the same function could give me either all valid answers or just the first valid answer it encounters.
Consider using a heuristic when you recurse across your various options in order to favor branches that are more likely to lead you to the correct answer.

The final code I ended up with after these suggestions was as follows:

from collections import Counter
import warnings
def possible(vec, m):
 n = len(vec) # Number of columns
 v = sum(vec)
 lower_bound = 0.25 * m * n + 1
 upper_bound = v - 0.25 * m * n - 1
 # Check for simple failure cases
 if sum(num for num in vec if num > 0) < lower_bound or sum(num for num in vec if num < 0) > upper_bound:
 return None
 # Use the high-performance, built-in counter class
 counts = Counter(vec)
 # Use a generator syntax rather than direct return
 # Some benefits:
 # 1. You avoid having to check if a value gets returned (temp variables), just "yield from" when recursing and "yield" when successful
 # 2. You automatically end up with a function that could produce _all_ valid answers, if desired, for free
 # 3. You end up with predictable errors if no answer is found
 # Use a shared mutable variable for R within a call stack, not a global
 # R is now a set to allow for O(1) "contains" checks
 valid_subvectors = dfs_backtrack(set(), counts, v, lower_bound, upper_bound)
 try:
 return next(valid_subvectors)
 except StopIteration:
 return None
 except RecursionError:
 warnings.warn("Unable to find subvector, it's too long/diverse for recursive solver")
 return None
def dfs_backtrack(R, vec, cur_sum, lower_bound, upper_bound):
 # Only need to check keys since, for each call we handle all possible numbers of a single key
 checked_this_already = tuple(sorted(vec.keys()))
 if checked_this_already not in R:
 # Since R is now a shared instance of a list, it acts "global" but only exists within the current call stack
 R.add(checked_this_already)
 # Summation is now performed by subtracting recursively from total, rather than re-summing each time
 if lower_bound <= cur_sum <= upper_bound:
 # Convert counter back into vector form
 yield list(vec.elements())
 # Add a simple heuristic to try to find the right answer faster
 def heuristic(k):
 return abs((lower_bound + upper_bound) // 2 - (cur_sum - k))
 # Pre-limit the keys we'll consider so we don't even run the heuristic on irrelevant keys (since 0's are no longer popped)
 potential_keys = [i for i in vec.keys() if i > 0]
 for i in sorted(potential_keys, key=heuristic):
 # Handle all possible numbers of that key at once to reduce recursion
 num_keys = vec.pop(i)
 for n in range(1, num_keys + 1):
 for solution in dfs_backtrack(R, vec, cur_sum - i * n, lower_bound, upper_bound):
 yield [i] * (num_keys - n) + solution
 vec[i] = num_keys

With these changes, for several sample vectors, I'm getting about a 3-5x speedup. Also, the change to the recursion patter makes it capable of handling much longer vectors so long as there are a limited number of unique values in them (rather than the original approach, which hits a recursion limit based on the raw length of the vector). This could be made better by avoiding recursion altogether, but that seemed too drastic a change to the code for this setting.

Stack Exchange Network

Finding sum of sub vector satisfying inequality

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Finding sum of sub vector satisfying inequality

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions