Bag data structure (a set that can store multiple copies of items)

Question 1

I am trying to make my class a bit cleaner, especially some of its methods like Count, __add__, remove and __iter__. I feel like I could use cleaner code on the above methods and get the same results.

Bag class represents and defines methods, operators, and an iterator for the Bag class. Bags are similar to sets, and have similar operations but unlike sets they can store multiple copies of items. Store the information in bags as dictionaries whose elements are keys whose associated values are the number of times the key occurs in the bag.

from collections import defaultdict
from goody import type_as_str
import copy
from collections import Counter
class Bag:
 def __init__(self, items = []):
 self.bag_value = defaultdict(int)
 for item in items:
 self.bag_value[item] += 1
 def __repr__(self):
 bag_list = []
 for item, count in self.bag_value.items():
 bag_list.extend(list(item*count))
 return 'Bag(' + str(bag_list) + ')'
 def __str__(self):
 return 'Bag(' + ','.join(str(item) + '[' + str(count) + ']' for item, count in self.bag_value.items()) + ')'
 def __len__(self):
 bag_len = 0
 for value in self.bag_value:
 bag_len += self.bag_value[value]
 return bag_len
 def unique(self):
 return len(self.bag_value)
 def __contains__(self, item):
 return item in self.bag_value
 def count(self, item):
 if item in self.bag_value:
 return self.bag_value[item]
 else:
 return 0
 def add(self, new):
 self.bag_value[new] += 1
 def __add__(self,right):
 mergedbag = Bag()
 mergedbag.bag_value = copy.copy(self.bag_value)
 for item in right.bag_value.keys():
 mergedbag.bag_value[item] += right.bag_value[item]
 return mergedbag
 def remove(self, item):
 if item in self.bag_value:
 if self.bag_value[item] == 1:
 del self.bag_value[item]
 else:
 self.bag_value[item] -= 1
 else:
 raise ValueError(str(item) + ' not in bag')
 def __eq__(self, right):
 if type(right) is not Bag:
 raise TypeError('Cannot compare Bag with' + type_as_str(right) + '. Can only compare Bag with Bag')
 else:
 return (Counter(self.bag_value)==Counter(right))
 def __ne__(self, right):
 return not self.__eq__(right)
 def __generate__(self, bag_value):
 for item in self.bag_value:
 for value in range(self.bag_value[item]):
 yield item
 def __iter__(self):
 return self.__generate__(self.bag_value)
if __name__ == '__main__':
# bag = Bag(['d','a','b','d','c','b','d'])
# bag2 = Bag(['d','a','b','d','c','b','d'])
# bag3 = Bag(['d','a','b','d','c','b'])
# print(bag == bag2)
# print(bag == bag3)
# print(bag != bag2)
# print(bag != bag3)
 import driver
 driver.driver()

Question 2

As you're trying to implement a multiset (or bag) I would recommend you to use collections.Counter which already does the same thing:

The Counter class is similar to bags or multisets in other languages.

So, simply inherit from Counter class and add your additional methods in the class or override the already present methods from Counter class as per your requirements. Counter class also supports &, |, - operators to perform various multiset operations.

Now, coming back to your code:

def __init__(self, items = []):
 self.bag_value = defaultdict(int)
 for item in items:
 self.bag_value[item] += 1

Never use a mutable object as default value of an argument as there value can persist between calls, the recommended way is to use None as default value and then check it inside of the __init__ method. As we are now inheriting from Counter we can directly operate on self instead of using a defaultdict to keep the count.

class Bag(Counter):
 def __init__(self, _items=None, **kwargs):
 _items = [] if _items is None else _items
 super().__init__(_items, **kwargs)

Why both _items and kwargs?

It depends on how you want to use it ultimately, but with this signature you can use it as:

Bag('aaabbccd')
Bag(a=3, b=5, c=7)
Bag('aabbcc', a=5, c=7, r=9)

I used _items to handle the case when user actually passed a key named items:

Bag(a=5, c=7, r=9, items=9)

def __repr__(self):
 bag_list = []
 for item, count in self.bag_value.items():
 bag_list.extend(list(item*count))
 return 'Bag(' + str(bag_list) + ')'

Counter class already has a method to handle this called elements() which returns an itertools.chain object which is an iterator, we can call list on it and we are done:

def __repr__(self):
 return 'Bag({})'.format(list(self.elements()))

def __str__(self):
 return 'Bag(' + ','.join(str(item) + '[' + str(count) + ']' for item, count in self.bag_value.items()) + ')'

We can take advantage of string formatting here to make the above code a little concise and easier to read:

def __str__(self):
 format_str = '{}[{}]'.format
 return 'Bag({})'.format(', '.join(format_str(k, v) for k, v in self.items()))

def __len__(self):
 bag_len = 0
 for value in self.bag_value:
 bag_len += self.bag_value[value]
 return bag_len

In our class we can obtain the length by simply summing the .values(), sum() returns 0 when the Bag is empty so it saves a lots of line:

def __len__(self):
 return sum(self.values())

We can simply drop __contains__ as Counter can already handle that part, and for unique() we can use len(self.keys()).

def __add__(self,right):
 mergedbag = Bag()
 mergedbag.bag_value = copy.copy(self.bag_value)
 for item in right.bag_value.keys():
 mergedbag.bag_value[item] += right.bag_value[item]
 return mergedbag

Here as we're only dealing with integer values copy.copy is not actually required, and instead of manual loop use the .update() method of dictionary:

def __add__(self, right):
 merged_bag = Bag()
 merged_bag.update(self)
 merged_bag.update(right)
 return merged_bag

In .remove() the only change is that I used string formatting with __repr__ version of item. __repr__ is more useful as with str() you won't be able to differentiate between 1 and '1'.

raise ValueError('{!r} not in bag'.format(item))

def __eq__(self, right):
 if type(right) is not Bag:
 raise TypeError('Cannot compare Bag with' + type_as_str(right) + '. Can only compare Bag with Bag')
 else:
 return (Counter(self.bag_value)==Counter(right))

The recommended way to check the type of an object is to use isinstance rather than calling type. And in else block I simply called the super class's __eq__ to handle the result instead of creating Counters:

def __eq__(self, right):
 if not isinstance(right, Bag):
 raise TypeError(('Cannot compare Bag with object of type {!r}. Can only '
 'compare Bag with Bag').format(type(right).__name__))
 else:
 return super().__eq__(right)

def __iter__(self):
 return self.__generate__(self.bag_value)

Never define your own dunder method(__generate__), these methods are considered special by Python and if any future version of Python defines a method by the same name then your code can result in some unknown errors. In our __iter__ method we can simply yield from self.elements():

def __iter__(self):
 yield from self.elements()

Note that there should be only be single line of space between method of a class and your indentation is also messed up, always use 4 spaces no tabs(Change your IDE's setting to use 4 spaces for a tab). Read PEP-008 for more information on styling-guide.

Complete code:

from collections import Counter
class Bag(Counter):
 def __init__(self, _items=None, **kwargs):
 _items = [] if _items is None else _items
 super().__init__(_items, **kwargs)
 def __repr__(self):
 return 'Bag({})'.format(list(self.elements()))
 def __str__(self):
 format_str = '{}[{}]'.format
 return 'Bag({})'.format(', '.join(format_str(k, v) for k, v in self.items()))
 def __len__(self):
 return sum(self.values())
 def unique(self):
 return len(self.keys())
 def count(self, item):
 return self[item]
 def add(self, new): 
 self[new] += 1
 def __add__(self, right):
 merged_bag = Bag()
 merged_bag.update(self)
 merged_bag.update(right)
 return merged_bag
 def remove(self, item):
 if item in self:
 if self[item] == 1:
 del self[item]
 else:
 self[item] -= 1
 else:
 raise ValueError('{!r} not in bag'.format(item))
 def __eq__(self, right):
 if not isinstance(right, Bag):
 raise TypeError(('Cannot compare Bag with object of type {!r}. Can only '
 'compare Bag with Bag').format(type(right).__name__))
 else:
 return super().__eq__(right)
 def __ne__(self, right):
 return not self.__eq__(right)
 def __iter__(self):
 yield from self.elements()

Question 3

thanks for the detailed explanations and sorry about the indentations.

Question 4

the remove method will no longer have bag_value so the else statement should be self[item]-=1.

Question 5

@LucyBen Thanks, missed that part. Fixed!

Question 6

Bag is very similar to collections.Counter. Maybe you could use it instead, or use it as base class or at least inspiration.
__repr__ is buggy:
```
>>> Bag(['foo','bar'])
Bag(['b', 'a', 'r', 'f', 'o', 'o'])
```
Instead of fixing the bug you can bypass it and leverage iteration by simply assigning
```
bag_list = list(self)
```
count could simply return self.bag_value.get(item, 0)
Double underscores at both ends of a name usually mean the method has special meaning in Python. AFAIK __generate__ has no such meaning, so you should use _generate instead, or even better, rename the method to __iter__

__eq__ insists on a Bag argument, but actually the implementation works with anything that can be turned into a Counter. Either remove the type check, or do like this:

def __eq__(self, right):
 if not isinstance(right, Bag):
 raise TypeError('Cannot compare Bag with' + type_as_str(right) + '. Can only compare Bag with Bag')
 else:
 return self.bag_value == right.bag_value

Ashwini Chaudhary Ashwini Chaudhary 3,15516 silver badges18 bronze badges · Accepted Answer · 2015-01-30 12:22:49Z

As you're trying to implement a multiset (or bag) I would recommend you to use collections.Counter which already does the same thing:

The Counter class is similar to bags or multisets in other languages.

So, simply inherit from Counter class and add your additional methods in the class or override the already present methods from Counter class as per your requirements. Counter class also supports &, |, - operators to perform various multiset operations.

Now, coming back to your code:

def __init__(self, items = []):
 self.bag_value = defaultdict(int)
 for item in items:
 self.bag_value[item] += 1

Never use a mutable object as default value of an argument as there value can persist between calls, the recommended way is to use None as default value and then check it inside of the __init__ method. As we are now inheriting from Counter we can directly operate on self instead of using a defaultdict to keep the count.

class Bag(Counter):
 def __init__(self, _items=None, **kwargs):
 _items = [] if _items is None else _items
 super().__init__(_items, **kwargs)

Why both _items and kwargs?

It depends on how you want to use it ultimately, but with this signature you can use it as:

Bag('aaabbccd')
Bag(a=3, b=5, c=7)
Bag('aabbcc', a=5, c=7, r=9)

I used _items to handle the case when user actually passed a key named items:

Bag(a=5, c=7, r=9, items=9)

def __repr__(self):
 bag_list = []
 for item, count in self.bag_value.items():
 bag_list.extend(list(item*count))
 return 'Bag(' + str(bag_list) + ')'

Counter class already has a method to handle this called elements() which returns an itertools.chain object which is an iterator, we can call list on it and we are done:

def __repr__(self):
 return 'Bag({})'.format(list(self.elements()))

def __str__(self):
 return 'Bag(' + ','.join(str(item) + '[' + str(count) + ']' for item, count in self.bag_value.items()) + ')'

We can take advantage of string formatting here to make the above code a little concise and easier to read:

def __str__(self):
 format_str = '{}[{}]'.format
 return 'Bag({})'.format(', '.join(format_str(k, v) for k, v in self.items()))

def __len__(self):
 bag_len = 0
 for value in self.bag_value:
 bag_len += self.bag_value[value]
 return bag_len

In our class we can obtain the length by simply summing the .values(), sum() returns 0 when the Bag is empty so it saves a lots of line:

def __len__(self):
 return sum(self.values())

We can simply drop __contains__ as Counter can already handle that part, and for unique() we can use len(self.keys()).

def __add__(self,right):
 mergedbag = Bag()
 mergedbag.bag_value = copy.copy(self.bag_value)
 for item in right.bag_value.keys():
 mergedbag.bag_value[item] += right.bag_value[item]
 return mergedbag

Here as we're only dealing with integer values copy.copy is not actually required, and instead of manual loop use the .update() method of dictionary:

def __add__(self, right):
 merged_bag = Bag()
 merged_bag.update(self)
 merged_bag.update(right)
 return merged_bag

In .remove() the only change is that I used string formatting with __repr__ version of item. __repr__ is more useful as with str() you won't be able to differentiate between 1 and '1'.

raise ValueError('{!r} not in bag'.format(item))

def __eq__(self, right):
 if type(right) is not Bag:
 raise TypeError('Cannot compare Bag with' + type_as_str(right) + '. Can only compare Bag with Bag')
 else:
 return (Counter(self.bag_value)==Counter(right))

The recommended way to check the type of an object is to use isinstance rather than calling type. And in else block I simply called the super class's __eq__ to handle the result instead of creating Counters:

def __eq__(self, right):
 if not isinstance(right, Bag):
 raise TypeError(('Cannot compare Bag with object of type {!r}. Can only '
 'compare Bag with Bag').format(type(right).__name__))
 else:
 return super().__eq__(right)

def __iter__(self):
 return self.__generate__(self.bag_value)

Never define your own dunder method(__generate__), these methods are considered special by Python and if any future version of Python defines a method by the same name then your code can result in some unknown errors. In our __iter__ method we can simply yield from self.elements():

def __iter__(self):
 yield from self.elements()

Note that there should be only be single line of space between method of a class and your indentation is also messed up, always use 4 spaces no tabs(Change your IDE's setting to use 4 spaces for a tab). Read PEP-008 for more information on styling-guide.

Complete code:

from collections import Counter
class Bag(Counter):
 def __init__(self, _items=None, **kwargs):
 _items = [] if _items is None else _items
 super().__init__(_items, **kwargs)
 def __repr__(self):
 return 'Bag({})'.format(list(self.elements()))
 def __str__(self):
 format_str = '{}[{}]'.format
 return 'Bag({})'.format(', '.join(format_str(k, v) for k, v in self.items()))
 def __len__(self):
 return sum(self.values())
 def unique(self):
 return len(self.keys())
 def count(self, item):
 return self[item]
 def add(self, new): 
 self[new] += 1
 def __add__(self, right):
 merged_bag = Bag()
 merged_bag.update(self)
 merged_bag.update(right)
 return merged_bag
 def remove(self, item):
 if item in self:
 if self[item] == 1:
 del self[item]
 else:
 self[item] -= 1
 else:
 raise ValueError('{!r} not in bag'.format(item))
 def __eq__(self, right):
 if not isinstance(right, Bag):
 raise TypeError(('Cannot compare Bag with object of type {!r}. Can only '
 'compare Bag with Bag').format(type(right).__name__))
 else:
 return super().__eq__(right)
 def __ne__(self, right):
 return not self.__eq__(right)
 def __iter__(self):
 yield from self.elements()

thanks for the detailed explanations and sorry about the indentations.
the remove method will no longer have bag_value so the else statement should be self[item]-=1.

Stack Exchange Network

Bag data structure (a set that can store multiple copies of items)

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Bag data structure (a set that can store multiple copies of items)

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions