I have written the following code for diving a list 'seq' into 'k' segments such that the maximum sum of each segment is minimized. I have used dynamic programming to solve this problem. But my class "Segmentation" seems a little overwhelmed with too many redundant functions. Can anybody give me a few suggestions on how can I improve this code? I am even using separate functions for returning boolean values and calculating the average. Also using raise SystemExit
because I do not want to add new library only for this line. Please provide your valuable feedback on the below code.
"""
Calculate the optimal k-segmentation of a sequence using dynamic programming
Author: Jahid Chowdhury Choton ([email protected])
The sub-problems for dynamic programming are given below:
Base case: Opt(i,j,M) = cost(S[:i]-M) if j==1
Recursion: Opt(i,j,M) = min( max( Opt(l,j-1, M), cost(S[l:i])-M) for l=1,2,...,i)
"""
class Segmentation:
def __init__(self, seq, k):
self.seq = seq
self.k = k
self.n = len(seq)
self.M = self.avg() # Parameter M is taken to be the average of the sequence
self.opt_val, self.opt_segments = self.opt() # Calculate optimal values and segmentation
def avg(self): # To calculate average of sequence seq
return sum(self.seq) / self.n
def is_odd(self, val): # To return a boolean value 0 or 1
return val % 2
def initialize_cost_and_dp(self): # To initialize cost array, DP array and traceback array
cost_sum = [0] * (self.n + 1) # Initialize cost array with zeros
dp = [[-1.0 for _ in range(self.n+1)] for _ in range(2)] # Tabulation array for DP
tb = [[[] for _ in range(self.n+1)] for _ in range(2)] # Traceback array for optimal segmentation
for i in range(0, self.n): # O(n)
# Add the elements of the sequence seq into cost array
cost_sum[i + 1] = self.seq[i] + cost_sum[i]
# Initialize DP array with infinity values
dp[0][i+1] = float('inf')
dp[1][i+1] = float('inf')
# print(cost_sum, dp, tb)
return cost_sum, dp, tb
def generate_segmentation(self, traceback): # To generate k-segmentation of sequence seq using traceback array
# Separate the traceback array into two parts starts and ends
starts = traceback[self.is_odd(self.k)][self.n]
ends = starts[1:] + [self.n]
# print(starts, ends)
k_segments = [self.seq[s:e] for s, e in zip(starts, ends)] # Concatenate the two parts for optimal segments
return k_segments
def opt(self): # To calculate the optimal value and segmentation using dp
if self.n < self.k:
print("Not enough cells for k-segmentation, now exiting the program...")
raise SystemExit # Program will terminate if k is less than the segmented cells
costSum, dp, tb = self.initialize_cost_and_dp() # Get the initial values of cost array, DP array and TB array
for i in range(1, self.k + 1): # O(k)
for j in range(1, self.n + 1): # O(n)
for l in range(i - 1, j): # O(n-k)
curr_val = max(dp[self.is_odd(i - 1)][l],
abs((costSum[j] - costSum[l]) - self.M)) # Check the current max value
prev_val = dp[self.is_odd(i)][j] # Check the previous value
if curr_val < prev_val: # Check the min value and update it in DP and traceback array
dp[self.is_odd(i)][j] = curr_val # Update the min value in dp array
tb[i % 2][j] = tb[self.is_odd(i - 1)][l] + [l] # Update the breakpoint in traceback array
optimal_value = dp[self.is_odd(self.k)][self.n] # Calculate the optimal value
optimal_segments = self.generate_segmentation(tb) # Generate the optimal segments
return optimal_value, optimal_segments
# Driver Code
def main():
# S = [3, 7, 7, 33]
# S = [3, 4, 2, 2, 6, 7, 43, 100000, 4, 3, 11]
# S = [3, 4, 2, 2, 6, 7, 43, 50, 43, 3, 11]
# S = [3, 4, 2, 2, 6, 7, 13, 10, 43, 50, 5]
# S = [1,1,1,5,5,5]
S = [3,8,7,1,11]
k = 3
# Call the class Segmentation
k_segmentation = Segmentation(S, k)
# Get the optimal value and segmentation
optimal_values = k_segmentation.opt_val
optimal_segmentation = k_segmentation.opt_segments
# Print the optimal value and segmentation
print("Optimal value =", optimal_values)
print("Optimal k-segmentation = ", optimal_segmentation)
if __name__ == '__main__':
main()
1 Answer 1
One could approach this from the goal:
What is the business of a Segmentation
instance, what are its business methods? You tagged object-oriented
: What is the value added by having a Segmentation(sequence, k)
instance over an opt(sequence, k)
function?
In your main()
, you don't (syntactically) call any method of k_segmentation
.
A first improvement would be to take the call to opt()
out of the constructor, rendering "the opt_...
attributes" obsolete.
While I can "see" the segmentation part opt()
returns, what is that optimal_value
?
Document both in a docstring for opt()
!
To get some advantage from a Segmentation
instance, have it keep information useful for more than one access.
If you construct it with just the sequence and pass the number of parts as a parameter to opt(k)
, you can look for information independent of k
- M
is, cost_sum
looks a candidate.
The methods are commented for what they do. Not bad.
Docstrings would be even better, starting withhelp(Segmentation.opt)
.
Withopt()
, a description of how it works would add value.Naming
•is_odd()
is named for how it works - I'd prefercurrent(i)
andprevious(i)
.
(There is one instance ofi % 2
not substituted.)
•costSum
isn't snake_case (nor isM
).
• internal methods likeinitialize_cost_and_dp
maybe should start with an underscore.
• Method/function names with an and are a code smell.
From Python 3.10, generate_segmentation()
could use itertools.pairwise
.
I think all that indexing with [self.is_odd(i)]
for current (i - 1
for previous) reduces readability - if you want to keep the mechanism, you can use descriptors -
I'd likely go for named attributes & an explicit swap at the end of the outer loop.
-
\$\begingroup\$ Thanks a lot for your useful comments. It really helped a lot! \$\endgroup\$Jahid Chowdhury Choton– Jahid Chowdhury Choton2022年09月06日 18:27:04 +00:00Commented Sep 6, 2022 at 18:27
Explore related questions
See similar questions with these tags.