I have developed some Python code which revolves around two custom classes - a 'Library' class (Lib) which contains a Python list of several objects based on a 'Cas' class. I've not posted the code for either of these classes here, but all you really need to know to understand my question is that the 'Library' object contains a Python list and the 'Cas' objects contain various attributes, some of which are strings and some are values.
One of the objectives of the code is to manipulate the Python list in the Library class and return a sub-set of the 'Cas' objects based on some user driven criteria. For example, return Cas objects where a particular attribute is equal to a given string or greater than a given value.
For this purpose, I have written the following generic method filterLibrarySingle
to allow me to filter the Python list in the library class (self.Lib
) based on various methods (filterMethod
), attributes (filterField
) and values (filterValue
). Within the method I'm achieving this using list comprehensions.
On profiling my code, it would appear that this method can be quite a bit of a bottle neck! Does any one have an idea of how I could speed it up?
def filterLibrarySingle(self, filterField, filterMethod, filterValue1, filterValue2=None):
if filterMethod == 'eq':
self.Lib = [cas for cas in self.Lib if getattr(cas, filterField) == filterValue1]
elif filterMethod == 'lt':
self.Lib = [cas for cas in self.Lib if getattr(cas, filterField) < filterValue1]
elif filterMethod == 'gt':
self.Lib = [cas for cas in self.Lib if getattr(cas, filterField) > filterValue1]
elif filterMethod == 'le':
self.Lib = [cas for cas in self.Lib if getattr(cas, filterField) <= filterValue1]
elif filterMethod == 'ge':
self.Lib = [cas for cas in self.Lib if getattr(cas, filterField) >= filterValue1]
elif filterMethod == 'gelt':
self.Lib = [cas for cas in self.Lib if getattr(cas, filterField) >= filterValue1 and getattr(cas, filterField) < filterValue2]
elif filterMethod == 'gele':
self.Lib = [cas for cas in self.Lib if getattr(cas, filterField) >= filterValue1 and getattr(cas, filterField) <= filterValue2]
I've wracked my brains for days on this to try and speed it up but I guess my Python knowledge simply isn't good enough!
3 Answers 3
From the comments it sounds like you don't really need to create a new list. Using generators will avoid the overhead of ever building a new list.
Just use this pattern:
def filterLibrarySingle(self, filterField, filterMethod, filterValue1, filterValue2=None):
if filterMethod == 'eq':
for cas in self.Lib:
if getattr(cas, filterField) == filterValue1:
yield cas
...
And you can call the method such as:
for item in filterLibrarySingle('my_field', 'eq', 'George'):
do_stuff(item)
If this doesn't work, then you may have to consider designing a more advanced structure to store your data so that your searches are not \$O(n)\$ but rather \$O(\log{}n)\$.
-
\$\begingroup\$ Thanks @RobertB for this example of using generators. I think this will greatly speed up my code. I'll let you know as soon as I've tried it \$\endgroup\$Mark– Mark2015年10月18日 22:58:22 +00:00Commented Oct 18, 2015 at 22:58
Building up on my comment:
You can create a dictionary that will speed up methods lookups:
criteria = {
'eq': lambda v, t1, t2: v == t1,
'lt': lambda v, t1, t2: v < t1,
'gt': lambda v, t1, t2: v > t1,
'le': lambda v, t1, t2: v <= t1,
'ge': lambda v, t1, t2: v >= t1,
'gelt': lambda v, t1, t2: t1 <= v < t2,
'gele': lambda v, t1, t2: t1 <= v <= t2,
}
def filter_library_single(self, field, method, threshold1, threshold2=None):
self.Lib = [cas for cas in self.Lib if criteria[method](getattr(cas, field), threshold1, threshold2)]
Depending on your needs, you can, as @RobertB suggested, turn your method into a generator. For python 2.7 you’re stuck with yield
:
criteria = {
'eq': lambda v, t1, t2: v == t1,
'lt': lambda v, t1, t2: v < t1,
'gt': lambda v, t1, t2: v > t1,
'le': lambda v, t1, t2: v <= t1,
'ge': lambda v, t1, t2: v >= t1,
'gelt': lambda v, t1, t2: t1 <= v < t2,
'gele': lambda v, t1, t2: t1 <= v <= t2,
}
def filter_library_single(self, field, method, threshold1, threshold2=None):
meet_criteria = criteria[method]
for cas in self.Lib:
if meet_criteria(getattr(cas, field), threshold1, threshold2):
yield cas
But in python 3 you can use filter
for this task:
criteria = {
'eq': lambda v, t1, t2: v == t1,
'lt': lambda v, t1, t2: v < t1,
'gt': lambda v, t1, t2: v > t1,
'le': lambda v, t1, t2: v <= t1,
'ge': lambda v, t1, t2: v >= t1,
'gelt': lambda v, t1, t2: t1 <= v < t2,
'gele': lambda v, t1, t2: t1 <= v <= t2,
}
def filter_helper(field, method, threshold1, threshold2):
meet_criteria = criteria[method]
def wrapped(cas):
return meet_criteria(getattr(cas, field), threshold1, threshold2)
return wrapped
def filter_library_single(self, field, method, threshold1, threshold2=None):
return filter(filter_helper(field, method, threshold1, threshold2), self.Lib)
I have some issues with your names. If the method is going to have filter
in the name, you already have enough context to not put it in all the argument names. You should also be using snake_case for naming functions and variables (that aren't constants), not camelCase.
filter_library_single(self, field, method, value1, value2=None)
About the method name. I don't know the context very well but what does the _single
part mean? Nothing in the method shows me, unless there's another method for filtering with multiple operators at once. You could clear this up with a docstring and fix another problem, you don't mention that this filters in place. This seems like unexpected behaviour to me, so it should be noted clearly.
You also don't test for the existence of value2
at all, even though you could easily just add in elif value2 is None
. A NameError
would be raised, which is misleading. Instead you should pre-emptively raise ValueError
with a helpful message. This is how I'd do it:
elif method == 'ge':
self.Lib = [cas for cas in self.Lib if getattr(cas, field) >= value1]
elif value2 is None:
raise ValueError("Second value was not supplied.")
elif method == 'gelt':
self.Lib = [cas for cas in self.Lib if getattr(cas, field) >= value1 and getattr(cas, field) < value2]
self.Lib = [cas for cas in self.Lib if the_dict[filterMethod](getattr(cas, filterField), filterValue1, filterValue2)]
\$\endgroup\$