I am pretty new to Python, so I want to listen to any advice that improve my coding style in a "pythonic" way, even about naming style of variables. The following code reflects a very general paradigm that formulates a problem as depth-first search in a tree, which I find interesting and powerful.
Suppose arr = [3, 5, 2, 1, 1, 7, 10, 2]
, we want to find all n=2
elements whose sum equals val=8
. The algorithm should return a list of tuples, where each tuple is INDICES of n=2
elements (at different positions of the array) whose sum is val=8
: [(3,5), (4,5), (0,1)]
, so we have,
arr[3]+arr[5] = 1+7 = 8;
arr[4]+arr[5] = 1+7 = 8;
arr[0]+arr[1] = 3+5 = 8.
Here is my code:
def nsum(arr, n, val):
if arr is None or len(arr) < n:
return []
# first sort the array by indice, so that we can search sequentially
# with early stopping
sorted_indice = sorted(xrange(len(arr)), key=lambda i: arr[i])
size = len(arr)
def nsum_recursive(arr, prev_indice, prev_sum, result):
''' This is the main algorithm that is a recursive function.
Think about the problem as tree search using depth-first search,
Each node/state is determined by (prev_indice, prev_sum), where
prev_indice is a list of currently explored indice, whose sum is
prev_sum. When len(prev_indice)==n, we reach a leaf node. Then we
can check whether the sum equals val.
'''
# termination condition (leaf nodes)
if len(prev_indice) == n:
if prev_sum == val:
result.append(tuple(sorted_indice[i] for i in prev_indice))
else:
current_count = len(prev_indice)
# depth-first search
for idx in range(
0 if not prev_indice else prev_indice[-1]+1,
size-(n-current_count-1)):
current_sum = prev_sum + arr[sorted_indice[idx]]
# early stopping (pruning of branches)
if current_sum>val and idx+1<size and arr[sorted_indice[idx+1]]>=0:
break
nsum_recursive(arr, prev_indice+[idx],
current_sum, result)
# start DFS from the root node
result = []
nsum_recursive(arr, [], 0, result)
return result
1 Answer 1
def nsum(arr, n, val):
A documentation string here is important. The names of the variables are so vague that even knowing the task in advance is difficult to understand their meaning.
if arr is None or len(arr) < n:
return []
This check is non very useful. It does not decrease the algorithm complexity. In particular the check arr is None
is wrong... you want the function to fail if the arr is not a list, otherwise you hide possible errors in the caller's code.
# first sort the array by indice, so that we can search sequentially
# with early stopping
sorted_indice = sorted(xrange(len(arr)), key=lambda i: arr[i])
size = len(arr)
invert the two lines above, so that you can use size in the previous line. Why to use xrange
instead of range
? The name of the variable sorted_indice
is not very good for my taste: it is not "sorted" (maybe it is "sorting") and what does "indice" means? Maybe you mean "indices"?
def nsum_recursive(arr, prev_indice, prev_sum, result):
Here the complexity is too much... we have four variables plus the three of the enclosing function: very difficult to keep track of all of them. arr
is repeated. result
should be the return value.
''' This is the main algorithm that is a recursive function.
Think about the problem as tree search using depth-first search,
Each node/state is determined by (prev_indice, prev_sum), where
prev_indice is a list of currently explored indice, whose sum is
prev_sum. When len(prev_indice)==n, we reach a leaf node. Then we
can check whether the sum equals val.
'''
The explanation is not very clear. You describe what are the nodes of the tree but what are the children of a node?
# termination condition (leaf nodes)
if len(prev_indice) == n:
if prev_sum == val:
result.append(tuple(sorted_indice[i] for i in prev_indice))
else:
The two nested if
are better replaced by a single if
and the program should return at once, so that else
is not required. This would reduce the indentation of your code. Also it seems that the check for equality and the check for early stopping are quite similar and could be done together.
current_count = len(prev_indice)
# depth-first search
for idx in range(
0 if not prev_indice else prev_indice[-1]+1,
size-(n-current_count-1)):
here it is very difficult to understand if the range of numbers is correct.
current_sum = prev_sum + arr[sorted_indice[idx]]
# early stopping (pruning of branches)
if current_sum>val and idx+1<size and arr[sorted_indice[idx+1]]>=0:
break
You can make the early stopping better (and maybe simpler) by checking that the current_sum plus the product of the remaining terms by the smallest term is larger that the target sum. You could also make the opposite check: if summing the larger possibile terms you cannot reach the target sum.
nsum_recursive(arr, prev_indice+[idx],
current_sum, result)
# start DFS from the root node
result = []
nsum_recursive(arr, [], 0, result)
return result
Explore related questions
See similar questions with these tags.
[(3,5), (4,5), (0,1)]
- I don't see how 4+5 or 0+1 equals 8. \$\endgroup\$