I am trying to make my class a bit cleaner, especially some of its methods like Count
, __add__
, remove
and __iter__
. I feel like I could use cleaner code on the above methods and get the same results.
Bag class represents and defines methods, operators, and an iterator for the Bag class. Bags are similar to sets, and have similar operations but unlike sets they can store multiple copies of items. Store the information in bags as dictionaries whose elements are keys whose associated values are the number of times the key occurs in the bag.
from collections import defaultdict
from goody import type_as_str
import copy
from collections import Counter
class Bag:
def __init__(self, items = []):
self.bag_value = defaultdict(int)
for item in items:
self.bag_value[item] += 1
def __repr__(self):
bag_list = []
for item, count in self.bag_value.items():
bag_list.extend(list(item*count))
return 'Bag(' + str(bag_list) + ')'
def __str__(self):
return 'Bag(' + ','.join(str(item) + '[' + str(count) + ']' for item, count in self.bag_value.items()) + ')'
def __len__(self):
bag_len = 0
for value in self.bag_value:
bag_len += self.bag_value[value]
return bag_len
def unique(self):
return len(self.bag_value)
def __contains__(self, item):
return item in self.bag_value
def count(self, item):
if item in self.bag_value:
return self.bag_value[item]
else:
return 0
def add(self, new):
self.bag_value[new] += 1
def __add__(self,right):
mergedbag = Bag()
mergedbag.bag_value = copy.copy(self.bag_value)
for item in right.bag_value.keys():
mergedbag.bag_value[item] += right.bag_value[item]
return mergedbag
def remove(self, item):
if item in self.bag_value:
if self.bag_value[item] == 1:
del self.bag_value[item]
else:
self.bag_value[item] -= 1
else:
raise ValueError(str(item) + ' not in bag')
def __eq__(self, right):
if type(right) is not Bag:
raise TypeError('Cannot compare Bag with' + type_as_str(right) + '. Can only compare Bag with Bag')
else:
return (Counter(self.bag_value)==Counter(right))
def __ne__(self, right):
return not self.__eq__(right)
def __generate__(self, bag_value):
for item in self.bag_value:
for value in range(self.bag_value[item]):
yield item
def __iter__(self):
return self.__generate__(self.bag_value)
if __name__ == '__main__':
# bag = Bag(['d','a','b','d','c','b','d'])
# bag2 = Bag(['d','a','b','d','c','b','d'])
# bag3 = Bag(['d','a','b','d','c','b'])
# print(bag == bag2)
# print(bag == bag3)
# print(bag != bag2)
# print(bag != bag3)
import driver
driver.driver()
2 Answers 2
As you're trying to implement a multiset
(or bag
) I would recommend you to use collections.Counter
which already does the same thing:
The
Counter
class is similar to bags or multisets in other languages.
So, simply inherit from Counter
class and add your additional methods in the class or override the already present methods from Counter
class as per your requirements. Counter
class also supports &
, |
, -
operators to perform various multiset operations.
Now, coming back to your code:
def __init__(self, items = []):
self.bag_value = defaultdict(int)
for item in items:
self.bag_value[item] += 1
Never use a mutable object as default value of an argument as there value can persist between calls, the recommended way is to use None
as default value and then check it inside of the __init__
method. As we are now inheriting from Counter
we can directly operate on self
instead of using a defaultdict
to keep the count.
class Bag(Counter):
def __init__(self, _items=None, **kwargs):
_items = [] if _items is None else _items
super().__init__(_items, **kwargs)
Why both
_items
andkwargs
?
It depends on how you want to use it ultimately, but with this signature you can use it as:
Bag('aaabbccd')
Bag(a=3, b=5, c=7)
Bag('aabbcc', a=5, c=7, r=9)
I used _items
to handle the case when user actually passed a key named items
:
Bag(a=5, c=7, r=9, items=9)
def __repr__(self):
bag_list = []
for item, count in self.bag_value.items():
bag_list.extend(list(item*count))
return 'Bag(' + str(bag_list) + ')'
Counter
class already has a method to handle this called elements()
which returns an itertools.chain
object which is an iterator, we can call list on it and we are done:
def __repr__(self):
return 'Bag({})'.format(list(self.elements()))
def __str__(self):
return 'Bag(' + ','.join(str(item) + '[' + str(count) + ']' for item, count in self.bag_value.items()) + ')'
We can take advantage of string formatting here to make the above code a little concise and easier to read:
def __str__(self):
format_str = '{}[{}]'.format
return 'Bag({})'.format(', '.join(format_str(k, v) for k, v in self.items()))
def __len__(self):
bag_len = 0
for value in self.bag_value:
bag_len += self.bag_value[value]
return bag_len
In our class we can obtain the length by simply summing the .values()
, sum()
returns 0 when the Bag
is empty so it saves a lots of line:
def __len__(self):
return sum(self.values())
We can simply drop __contains__
as Counter
can already handle that part, and for unique()
we can use len(self.keys())
.
def __add__(self,right):
mergedbag = Bag()
mergedbag.bag_value = copy.copy(self.bag_value)
for item in right.bag_value.keys():
mergedbag.bag_value[item] += right.bag_value[item]
return mergedbag
Here as we're only dealing with integer values copy.copy
is not actually required, and instead of manual loop use the .update()
method of dictionary:
def __add__(self, right):
merged_bag = Bag()
merged_bag.update(self)
merged_bag.update(right)
return merged_bag
In .remove()
the only change is that I used string formatting with __repr__
version of item. __repr__
is more useful as with str()
you won't be able to differentiate between 1
and '1'
.
raise ValueError('{!r} not in bag'.format(item))
def __eq__(self, right):
if type(right) is not Bag:
raise TypeError('Cannot compare Bag with' + type_as_str(right) + '. Can only compare Bag with Bag')
else:
return (Counter(self.bag_value)==Counter(right))
The recommended way to check the type of an object is to use isinstance
rather than calling type
. And in else
block I simply called the super class's __eq__
to handle the result instead of creating Counter
s:
def __eq__(self, right):
if not isinstance(right, Bag):
raise TypeError(('Cannot compare Bag with object of type {!r}. Can only '
'compare Bag with Bag').format(type(right).__name__))
else:
return super().__eq__(right)
def __iter__(self):
return self.__generate__(self.bag_value)
Never define your own dunder method(__generate__
), these methods are considered special by Python and if any future version of Python defines a method by the same name then your code can result in some unknown errors. In our __iter__
method we can simply yield from self.elements()
:
def __iter__(self):
yield from self.elements()
Note that there should be only be single line of space between method of a class and your indentation is also messed up, always use 4 spaces no tabs(Change your IDE's setting to use 4 spaces for a tab). Read PEP-008 for more information on styling-guide.
Complete code:
from collections import Counter
class Bag(Counter):
def __init__(self, _items=None, **kwargs):
_items = [] if _items is None else _items
super().__init__(_items, **kwargs)
def __repr__(self):
return 'Bag({})'.format(list(self.elements()))
def __str__(self):
format_str = '{}[{}]'.format
return 'Bag({})'.format(', '.join(format_str(k, v) for k, v in self.items()))
def __len__(self):
return sum(self.values())
def unique(self):
return len(self.keys())
def count(self, item):
return self[item]
def add(self, new):
self[new] += 1
def __add__(self, right):
merged_bag = Bag()
merged_bag.update(self)
merged_bag.update(right)
return merged_bag
def remove(self, item):
if item in self:
if self[item] == 1:
del self[item]
else:
self[item] -= 1
else:
raise ValueError('{!r} not in bag'.format(item))
def __eq__(self, right):
if not isinstance(right, Bag):
raise TypeError(('Cannot compare Bag with object of type {!r}. Can only '
'compare Bag with Bag').format(type(right).__name__))
else:
return super().__eq__(right)
def __ne__(self, right):
return not self.__eq__(right)
def __iter__(self):
yield from self.elements()
-
\$\begingroup\$ thanks for the detailed explanations and sorry about the indentations. \$\endgroup\$LucyBen– LucyBen2015年01月30日 12:57:12 +00:00Commented Jan 30, 2015 at 12:57
-
\$\begingroup\$ the remove method will no longer have bag_value so the else statement should be
self[item]-=1
. \$\endgroup\$LucyBen– LucyBen2015年01月30日 19:02:42 +00:00Commented Jan 30, 2015 at 19:02 -
\$\begingroup\$ @LucyBen Thanks, missed that part. Fixed! \$\endgroup\$Ashwini Chaudhary– Ashwini Chaudhary2015年01月30日 19:05:33 +00:00Commented Jan 30, 2015 at 19:05
Bag
is very similar tocollections.Counter
. Maybe you could use it instead, or use it as base class or at least inspiration.__repr__
is buggy:>>> Bag(['foo','bar']) Bag(['b', 'a', 'r', 'f', 'o', 'o'])
Instead of fixing the bug you can bypass it and leverage iteration by simply assigning
bag_list = list(self)
count
could simplyreturn self.bag_value.get(item, 0)
- Double underscores at both ends of a name usually mean the method has special meaning in Python. AFAIK
__generate__
has no such meaning, so you should use_generate
instead, or even better, rename the method to__iter__
__eq__
insists on aBag
argument, but actually the implementation works with anything that can be turned into aCounter
. Either remove the type check, or do like this:def __eq__(self, right): if not isinstance(right, Bag): raise TypeError('Cannot compare Bag with' + type_as_str(right) + '. Can only compare Bag with Bag') else: return self.bag_value == right.bag_value