I am asked to implement code that checks if there is a sub vector that has a sum, v_l
, that would satisfy \$ \frac{1}{4}mn + 1 \leq v_l \leq v - \frac{1}{4} m n - 1\$.
def possible(vec, m):
n = len(vec) # Number of columns
v = sum(vec)
global R
R = []
num = {i: 0 for i in set(vec)}
for i in vec:
num[i] += 1
solution = dfs_backtrack(num, m, n, v)
return solution
def dfs_backtrack(vec, m, n, v):
v_l = sum([i*vec[i] for i in vec.keys()])
R.append(vec)
if 0.25 * m * n + 1 <= v_l <= v - 0.25 * m * n - 1:
return vec
for i in vec.keys():
temp = dict(vec)
temp[i] -= 1
if temp[i] is 0:
temp.pop(i)
if temp and temp not in R :
temp = dfs_backtrack(temp, m, n, v)
else:
temp = None
if temp:
return temp
return None
From my analysis it would have a space and time complexity of \$O(n^{m+1} m^{-m-1})\,ドル where \1ドル \leq m \leq n\$. What can I do to improve the space and time complexity?
-
\$\begingroup\$ You have a lot of if statements. I think vectorizing using numpy might help speed things up. If you store the variables (like vec) as an array, you can use numpy where to find the indices at which the values are bounded within the desired range. Also, numpy unique (with kwarg return_counts) can be an alternative approach to your for loop iteration. But I think the best approach is determing the m and n bounds before computing all sums brute force. \$\endgroup\$user127168– user1271682018年03月30日 11:40:57 +00:00Commented Mar 30, 2018 at 11:40
-
\$\begingroup\$ I am not able to use other libraries unfortunately. if i determine the m and n bounds before computing all the sums, how can i do it without computing the sum of the sub-arrays? \$\endgroup\$albusSimba– albusSimba2018年03月30日 13:03:42 +00:00Commented Mar 30, 2018 at 13:03
-
\$\begingroup\$ You would have to compute some sums but would not have to check all sums. As an example, m * n <= 4 * (v_l - 1) from your inequality; this won’t work if m or n is too large, implying you can restrict the values you check. ...unless I am misunderstanding something \$\endgroup\$user127168– user1271682018年04月01日 09:43:37 +00:00Commented Apr 1, 2018 at 9:43
-
\$\begingroup\$ that makes sense. So I exit the the recursion earlier. That would help. But the worst complexity would remain the same and the average case is lower? \$\endgroup\$albusSimba– albusSimba2018年04月02日 07:12:52 +00:00Commented Apr 2, 2018 at 7:12
-
\$\begingroup\$ Yes. Applying the condition to the upper bound would be trickier. \$\endgroup\$user127168– user1271682018年04月02日 07:15:02 +00:00Commented Apr 2, 2018 at 7:15
2 Answers 2
Since the sum v_l
changes with each iteration through your dictionary values of num
, I don't know of a way to pre-compute the right-side sums before-hand. This would be a little easier if you could use external modules. That said, you might see a slight speed-up if you use more comprehensions. For example, you can get num
via num = {x : vec.count(x) for x in vec}
. Also, use == 0
(not is 0
) to check zero equality and use is None
(not == None
) to check None
.
Also, you are iterating through all possible combinations. In the case of using votes1
and m1
, you know that m = m1 = 10
and n = len(votes1) = 30
. So you know that your lower bound is cut-off at 76. Those are 76 permutations that you do not need to use. You can include a break
statement at the lower bound since you are incrementing downward (temp[i] -= 1
), but I would instead use this lower bound as the starting point and iterate by incrementing upward as you may find some other way to restrict the sums in the upper limit.
Lastly, the use of globals and non-descriptive variable names makes it hard to edit the code. For example, votes
instead of vec
, ndict
instead of n
, etc. Also, you can pass R
into dfs_backtrack
.
I personally prefer iterating over lists instead of dictionaries since you can use zip. I don't have a full solution for your problem, but it may help as a start.
def get_lower_bound(m, n):
return m*n/4 + 1
def get_upper_bound(m, n, v):
return v - m*n/4 - 1
def initialize_constants(m, n, votes):
""" Pre-compute these values as they do not change."""
size = len(votes)
total = sum(votes)
lower_bound = get_lower_bound(m, n=30)
return size, total, lower_bound
def initialize_vl(votes, m):
## I think set look-ups are quickest, these provide unique values just like dictionaries do
unique_votes = list(set(votes)) #keys for num
vcounts = [votes.count(x) for x in unique_votes] #vals for num
vl = sum([a*b for a,b in zip(unique_votes, vcounts)])
return unique_votes, vcounts, vl
This way, you don't need to use
temp = dict(vec)
temp[i] -= 1
if temp[i] is 0:
temp.pop(i)
if temp and temp not in R :
as you are creating temp
as a copy of vec
; any change to temp
changes vec
as well. Also, you can iterate through your unique values in list(set(...))
instead of checking if each element is in R
.
Sorry I could not be of more help. Let me know if anything is unclear.
-
\$\begingroup\$ erm.. actually removing
R
would increase the complexity. some array that have already been computed would be computed again. i.e.[5,5,5,6]
and[5,5,6,6]
would both compute[5,5,6]
unless I'm not calling them recursively. \$\endgroup\$albusSimba– albusSimba2018年04月02日 13:49:49 +00:00Commented Apr 2, 2018 at 13:49
Taking a bit of a stab at this, I would suggest some of the following changes:
- Use
collections.Counter
to perform your counting. Minor tweak, but it's in the standard library and it's fast. - Use a shared argument rather than a global variable to track which recursions you've already computed.
- Pre-compute the lower and upper bounds. They don't change ever, and it cleans up the recursed function a bit.
- Since you know which item you're removing when you recurse, subtract it from a running sum rather than re-sum the vector at each recursion.
- Instead of removing one item from your vector for each recursion, since you've already bothered to group and count the unique values, go ahead and just recurse through each unique value for each recursion, and use a simple
for
loop to handle all possible quantities of that value that you include/remove. - Converting from a standard
return
-based function to a generator allows you to useyield from
to more cleanly handle the case where "I didn't find the answer, but one of my children might have". In this case, I actually loop on the recursion rather thanyield from
it, but it has the same effect. It also means that, for free, the same function could give me either all valid answers or just the first valid answer it encounters. - Consider using a heuristic when you recurse across your various options in order to favor branches that are more likely to lead you to the correct answer.
The final code I ended up with after these suggestions was as follows:
from collections import Counter
import warnings
def possible(vec, m):
n = len(vec) # Number of columns
v = sum(vec)
lower_bound = 0.25 * m * n + 1
upper_bound = v - 0.25 * m * n - 1
# Check for simple failure cases
if sum(num for num in vec if num > 0) < lower_bound or sum(num for num in vec if num < 0) > upper_bound:
return None
# Use the high-performance, built-in counter class
counts = Counter(vec)
# Use a generator syntax rather than direct return
# Some benefits:
# 1. You avoid having to check if a value gets returned (temp variables), just "yield from" when recursing and "yield" when successful
# 2. You automatically end up with a function that could produce _all_ valid answers, if desired, for free
# 3. You end up with predictable errors if no answer is found
# Use a shared mutable variable for R within a call stack, not a global
# R is now a set to allow for O(1) "contains" checks
valid_subvectors = dfs_backtrack(set(), counts, v, lower_bound, upper_bound)
try:
return next(valid_subvectors)
except StopIteration:
return None
except RecursionError:
warnings.warn("Unable to find subvector, it's too long/diverse for recursive solver")
return None
def dfs_backtrack(R, vec, cur_sum, lower_bound, upper_bound):
# Only need to check keys since, for each call we handle all possible numbers of a single key
checked_this_already = tuple(sorted(vec.keys()))
if checked_this_already not in R:
# Since R is now a shared instance of a list, it acts "global" but only exists within the current call stack
R.add(checked_this_already)
# Summation is now performed by subtracting recursively from total, rather than re-summing each time
if lower_bound <= cur_sum <= upper_bound:
# Convert counter back into vector form
yield list(vec.elements())
# Add a simple heuristic to try to find the right answer faster
def heuristic(k):
return abs((lower_bound + upper_bound) // 2 - (cur_sum - k))
# Pre-limit the keys we'll consider so we don't even run the heuristic on irrelevant keys (since 0's are no longer popped)
potential_keys = [i for i in vec.keys() if i > 0]
for i in sorted(potential_keys, key=heuristic):
# Handle all possible numbers of that key at once to reduce recursion
num_keys = vec.pop(i)
for n in range(1, num_keys + 1):
for solution in dfs_backtrack(R, vec, cur_sum - i * n, lower_bound, upper_bound):
yield [i] * (num_keys - n) + solution
vec[i] = num_keys
With these changes, for several sample vectors, I'm getting about a 3-5x speedup. Also, the change to the recursion patter makes it capable of handling much longer vectors so long as there are a limited number of unique values in them (rather than the original approach, which hits a recursion limit based on the raw length of the vector). This could be made better by avoiding recursion altogether, but that seemed too drastic a change to the code for this setting.
Explore related questions
See similar questions with these tags.