Number partitioning: how to allow both 'light' and 'heavy' output?

Question 1

I am working on a library for algorithms for multiway number partitioning. One challenge I face is that some users need the entire partition, while other users need only the sums of the parts. For example, suppose the input is the following list: [1,2,3,3,5,9,9], and the goal is to partition them into two subsets. Then the entire partition could be [[9, 5, 2], [9, 3, 3, 1]], but if a user only needs the sums, then the output would be [16, 16]. In this case, it is a waste of time to maintain the lists. Initially, I thought of writing two variants for each algorithm. For example, for the greedy number partitioning algorithm, I had:

def greedy_partition(numbins: int, items: List[float]):
 """
 Partition the given items using the greedy number partitioning algorithm.
 It return the entire partition, so it runs slower.
 >>> greedy_partition(numbins=2, items=[1,2,3,3,5,9,9])
 [[9, 5, 2], [9, 3, 3, 1]]
 >>> greedy_partition(numbins=3, items=[1,2,3,3,5,9,9])
 [[9, 2], [9, 1], [5, 3, 3]]
 """
 bins = [[] for _ in range(numbins)]
 for item in sorted(items, reverse=True):
 index_of_least_full_bin = min(range(numbins), key=lambda i: sum(bins[i]))
 bins[index_of_least_full_bin].append(item)
 return bins
def greedy_sums(numbins: int, items: List[float]):
 """
 Partition the given items using the greedy number partitioning algorithm.
 It returns only the sums, so it runs much faster.
 >>> greedy_sums(numbins=2, items=[1,2,3,3,5,9,9])
 [16, 16]
 >>> greedy_sums(numbins=3, items=[1,2,3,3,5,9,9])
 [11, 10, 11]
 """
 sums = [0 for _ in range(numbins)]
 for item in sorted(items, reverse=True):
 index_of_least_full_bin = min(range(numbins), key=lambda i: sums[i])
 sums[index_of_least_full_bin] += item
 return sums

This works, but the algorithm is duplicated. If I add more algorithms, I will have to write each of them twice.

EDIT: Of course, I can use only greedy_partition, and then take the sum of each part. But this is extremely inefficient when the number of items is large. For example, I tested both functions on increasing number of items, and got the following results (in seconds):

3 bins, 1000 items: greedy_partition=0.022589921951293945. greedy_sums=0.0010304450988769531.
3 bins, 10000 items: greedy_partition=2.2264041900634766. greedy_sums=0.013550519943237305.
3 bins, 20000 items: greedy_partition=9.81221604347229. greedy_sums=0.020364999771118164.
3 bins, 40000 items: greedy_partition=36.83849239349365. greedy_sums=0.04088854789733887.

My current solution is as follows. I defined an abstract class to handle the bins during the run:

class Bins(ABC):
 def __init__(self, numbins: int):
 self.num = numbins
 @abstractmethod
 def add_item_to_bin(self, item: float, bin_index: int):
 pass
 @abstractmethod
 def result(self):
 return None

I defined two sub-classes: one keeps only the sums, and the other keeps the partition too:

class BinsKeepingSums(Bins):
 def __init__(self, numbins: int):
 super().__init__(numbins)
 self.sums = numbins*[0]
 def add_item_to_bin(self, item: float, bin_index: int):
 self.sums[bin_index] += item
 def result(self):
 return self.sums
class BinsKeepingContents(BinsKeepingSums):
 def __init__(self, numbins: int):
 super().__init__(numbins)
 self.bins = [[] for _ in range(numbins)]
 def add_item_to_bin(self, item: float, bin_index: int):
 super().add_item_to_bin(item, bin_index)
 self.bins[bin_index].append(item)
 def result(self):
 return self.bins

Now, I can write the algorithm only once, sending the desired Bins structure as parameter:

def greedy(bins: Bins, items: List[float]):
 """
 Partition the given items using the greedy number partitioning algorithm.
 Return the partition.
 >>> greedy(bins=BinsKeepingSums(2), items=[1,2,3,3,5,9,9])
 [16, 16]
 >>> greedy(bins=BinsKeepingContents(2), items=[1,2,3,3,5,9,9])
 [[9, 5, 2], [9, 3, 3, 1]]
 """
 for item in sorted(items, reverse=True):
 index_of_least_full_bin = min(range(bins.num), key=lambda i: bins.sums[i])
 bins.add_item_to_bin(item, index_of_least_full_bin)
 return bins.result()

Before I implement some more algorithms, I will be happy for feedback regarding this solution. Particularly, how to make it more simple, general, intuitive and efficient.

_{NOTE: I asked a different question on a similar topic here Greedy number partitioning algorithm}

Question 2

Summing the items seems like a special case of the partitioning problem. So I would implement it as an operation on the return value of the partitioning:

def greedy_sums(numbins: int, items: list[float]) -> list[float]:
 """..."""
 return [sum(items) for items in greedy_partition(numbins, items)]

Benefits:

No duplicated code
Code reuse
No complex classes

Your worry about "needlessly" keeping the lists to me sounds like premature and possibly useless optimization. I would worry about that if and only if you'll find significant performance drops in real-world application of the algorithms.

Performance
Your performance loss comes from a suboptimal implementation of your partitioning algorithm. You can improve it, by tracking the sums as keys:

def greedy_bins_and_sums(numbins: int, items: list[float]):
 """..."""
 bins = [[] for _ in range(numbins)]
 sums = [0] * numbins
 for item in sorted(items, reverse=True):
 index_of_least_full_bin = min(range(numbins), key=sums.__getitem__)
 bins[index_of_least_full_bin].append(item)
 sums[index_of_least_full_bin] += item
 return bins, sums

You can verify that it produces the same results with:

@contextmanager
def perf_count():
 start = perf_counter()
 yield
 print('Time:', perf_counter() - start)
def main():
 for size in (1000, 10000, 40000):
 lst = [randint(0, size) for _ in range(size)]
 with perf_count():
 parts1 = greedy_partition(3, lst)
 with perf_count():
 parts2, sums = greedy_bins_and_sums(3, lst)
 
 print(parts1 == parts2)

Time: 0.006648327998846071
Time: 0.001240477999090217
True
Time: 0.5112279509994551
Time: 0.012690333998762071
True
Time: 8.914125162998971
Time: 0.058278149999750894
True

This way, you can provide the user directly with the partitions and their sums as the function now calculates both.

By using sums.__getitem__ you can get another small performance boost, since you spare one extra call to the wrapping lambda.

A custom data structure
If you prefer to use a custom data structure, another alternative would be to subclass list into a container that keeps track of its sums:

class Bin(list):
 """Represents a bin with tracked sum."""
 
 def __init__(self):
 super().__init__()
 self.sum = 0
 
 def append(self, item: float | int) -> None:
 """Adds an item and updates its sum."""
 super().append(item)
 self.sum += item
def greedy(numbins: int, items: list[float]) -> list[Bin]:
 """
 Partition the given items using the greedy number partitioning algorithm.
 >>> greedy_partition(numbins=2, items=[1,2,3,3,5,9,9])
 [[9, 5, 2], [9, 3, 3, 1]]
 >>> greedy_partition(numbins=3, items=[1,2,3,3,5,9,9])
 [[9, 2], [9, 1], [5, 3, 3]]
 """
 partitions = [Bin() for _ in range(numbins)]
 for item in sorted(items, reverse=True):
 min(partitions, key=lambda b: b.sum).append(item)
 return partitions

This has the benefit, that users can still use the items like lists, but can also access its property sum to get the respective bin's sum without cost.

Question 3

There are very substantial performance differences when the number of items is large. See my current edit. Since I write a library (and not a particular application), I would like to make it efficient from the start.

Question 4

I am amazed at the performance improvement. I did not know that handling lists in python was so fast.

Richard Neumann Richard Neumann 6,39122 silver badges41 bronze badges · Accepted Answer · 2022-03-09 09:44:18Z

Summing the items seems like a special case of the partitioning problem. So I would implement it as an operation on the return value of the partitioning:

def greedy_sums(numbins: int, items: list[float]) -> list[float]:
 """..."""
 return [sum(items) for items in greedy_partition(numbins, items)]

Benefits:

No duplicated code
Code reuse
No complex classes

Your worry about "needlessly" keeping the lists to me sounds like premature and possibly useless optimization. I would worry about that if and only if you'll find significant performance drops in real-world application of the algorithms.

Performance
Your performance loss comes from a suboptimal implementation of your partitioning algorithm. You can improve it, by tracking the sums as keys:

def greedy_bins_and_sums(numbins: int, items: list[float]):
 """..."""
 bins = [[] for _ in range(numbins)]
 sums = [0] * numbins
 for item in sorted(items, reverse=True):
 index_of_least_full_bin = min(range(numbins), key=sums.__getitem__)
 bins[index_of_least_full_bin].append(item)
 sums[index_of_least_full_bin] += item
 return bins, sums

You can verify that it produces the same results with:

@contextmanager
def perf_count():
 start = perf_counter()
 yield
 print('Time:', perf_counter() - start)
def main():
 for size in (1000, 10000, 40000):
 lst = [randint(0, size) for _ in range(size)]
 with perf_count():
 parts1 = greedy_partition(3, lst)
 with perf_count():
 parts2, sums = greedy_bins_and_sums(3, lst)
 
 print(parts1 == parts2)

Time: 0.006648327998846071
Time: 0.001240477999090217
True
Time: 0.5112279509994551
Time: 0.012690333998762071
True
Time: 8.914125162998971
Time: 0.058278149999750894
True

This way, you can provide the user directly with the partitions and their sums as the function now calculates both.

By using sums.__getitem__ you can get another small performance boost, since you spare one extra call to the wrapping lambda.

A custom data structure
If you prefer to use a custom data structure, another alternative would be to subclass list into a container that keeps track of its sums:

class Bin(list):
 """Represents a bin with tracked sum."""
 
 def __init__(self):
 super().__init__()
 self.sum = 0
 
 def append(self, item: float | int) -> None:
 """Adds an item and updates its sum."""
 super().append(item)
 self.sum += item
def greedy(numbins: int, items: list[float]) -> list[Bin]:
 """
 Partition the given items using the greedy number partitioning algorithm.
 >>> greedy_partition(numbins=2, items=[1,2,3,3,5,9,9])
 [[9, 5, 2], [9, 3, 3, 1]]
 >>> greedy_partition(numbins=3, items=[1,2,3,3,5,9,9])
 [[9, 2], [9, 1], [5, 3, 3]]
 """
 partitions = [Bin() for _ in range(numbins)]
 for item in sorted(items, reverse=True):
 min(partitions, key=lambda b: b.sum).append(item)
 return partitions

This has the benefit, that users can still use the items like lists, but can also access its property sum to get the respective bin's sum without cost.

There are very substantial performance differences when the number of items is large. See my current edit. Since I write a library (and not a particular application), I would like to make it efficient from the start.
I am amazed at the performance improvement. I did not know that handling lists in python was so fast.

Stack Exchange Network

Number partitioning: how to allow both 'light' and 'heavy' output?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Number partitioning: how to allow both 'light' and 'heavy' output?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions