2
\$\begingroup\$

I have developed some Python code which revolves around two custom classes - a 'Library' class (Lib) which contains a Python list of several objects based on a 'Cas' class. I've not posted the code for either of these classes here, but all you really need to know to understand my question is that the 'Library' object contains a Python list and the 'Cas' objects contain various attributes, some of which are strings and some are values.

One of the objectives of the code is to manipulate the Python list in the Library class and return a sub-set of the 'Cas' objects based on some user driven criteria. For example, return Cas objects where a particular attribute is equal to a given string or greater than a given value.

For this purpose, I have written the following generic method filterLibrarySingle to allow me to filter the Python list in the library class (self.Lib) based on various methods (filterMethod), attributes (filterField) and values (filterValue). Within the method I'm achieving this using list comprehensions.

On profiling my code, it would appear that this method can be quite a bit of a bottle neck! Does any one have an idea of how I could speed it up?

def filterLibrarySingle(self, filterField, filterMethod, filterValue1, filterValue2=None):
 if filterMethod == 'eq':
 self.Lib = [cas for cas in self.Lib if getattr(cas, filterField) == filterValue1]
 elif filterMethod == 'lt':
 self.Lib = [cas for cas in self.Lib if getattr(cas, filterField) < filterValue1]
 elif filterMethod == 'gt':
 self.Lib = [cas for cas in self.Lib if getattr(cas, filterField) > filterValue1]
 elif filterMethod == 'le':
 self.Lib = [cas for cas in self.Lib if getattr(cas, filterField) <= filterValue1]
 elif filterMethod == 'ge':
 self.Lib = [cas for cas in self.Lib if getattr(cas, filterField) >= filterValue1]
 elif filterMethod == 'gelt':
 self.Lib = [cas for cas in self.Lib if getattr(cas, filterField) >= filterValue1 and getattr(cas, filterField) < filterValue2]
 elif filterMethod == 'gele':
 self.Lib = [cas for cas in self.Lib if getattr(cas, filterField) >= filterValue1 and getattr(cas, filterField) <= filterValue2]

I've wracked my brains for days on this to try and speed it up but I guess my Python knowledge simply isn't good enough!

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Oct 18, 2015 at 20:08
\$\endgroup\$
4
  • 3
    \$\begingroup\$ It is hard to comment without seeing how you actually use this method in context. What is taking a long time is the construction of the new list. Since your filter is destructive (if you filter items they are removed permanently from the list), the list comprehension is a pretty efficient method of building lists as it is O(n). If you just needed to return items that match the filter, but not construct a new list necessarily, then you could use more of a 'generator/iterator' model where you just 'yield' values that match the criteria. This is completely different behavior, but would be faster. \$\endgroup\$ Commented Oct 18, 2015 at 21:40
  • \$\begingroup\$ Interesting. The filtering methods are really to return objects which match the user defined criteria. Using the current list construct, the full list of objets is often restored from a backup so users can select another filter. For example, in a list of animals, the first search might look for 'Bears'. Once returned and some iteration operation (e.g. print attributes) has been performed, the full list is restored from a copy before the next search for 'Elephants'. Do you think generators could be used in this instance given that the filtered lists area created, iterated over, then destroyed? \$\endgroup\$ Commented Oct 18, 2015 at 22:31
  • \$\begingroup\$ This is very inefficient. Don't destroy and restore the list all the time. Just use generators on the same list over and over. I'll put it in an answer where there is more room. \$\endgroup\$ Commented Oct 18, 2015 at 22:35
  • 3
    \$\begingroup\$ If you build a dict of lambdas, you method is only self.Lib = [cas for cas in self.Lib if the_dict[filterMethod](getattr(cas, filterField), filterValue1, filterValue2)] \$\endgroup\$ Commented Oct 18, 2015 at 23:12

3 Answers 3

5
\$\begingroup\$

From the comments it sounds like you don't really need to create a new list. Using generators will avoid the overhead of ever building a new list.

Just use this pattern:

def filterLibrarySingle(self, filterField, filterMethod, filterValue1, filterValue2=None):
 if filterMethod == 'eq':
 for cas in self.Lib:
 if getattr(cas, filterField) == filterValue1:
 yield cas
 ...

And you can call the method such as:

for item in filterLibrarySingle('my_field', 'eq', 'George'):
 do_stuff(item)

If this doesn't work, then you may have to consider designing a more advanced structure to store your data so that your searches are not \$O(n)\$ but rather \$O(\log{}n)\$.

ferada
11.4k25 silver badges65 bronze badges
answered Oct 18, 2015 at 22:39
\$\endgroup\$
1
  • \$\begingroup\$ Thanks @RobertB for this example of using generators. I think this will greatly speed up my code. I'll let you know as soon as I've tried it \$\endgroup\$ Commented Oct 18, 2015 at 22:58
3
\$\begingroup\$

Building up on my comment:

You can create a dictionary that will speed up methods lookups:

criteria = {
 'eq': lambda v, t1, t2: v == t1,
 'lt': lambda v, t1, t2: v < t1,
 'gt': lambda v, t1, t2: v > t1,
 'le': lambda v, t1, t2: v <= t1,
 'ge': lambda v, t1, t2: v >= t1,
 'gelt': lambda v, t1, t2: t1 <= v < t2,
 'gele': lambda v, t1, t2: t1 <= v <= t2,
}
def filter_library_single(self, field, method, threshold1, threshold2=None):
 self.Lib = [cas for cas in self.Lib if criteria[method](getattr(cas, field), threshold1, threshold2)]

Depending on your needs, you can, as @RobertB suggested, turn your method into a generator. For python 2.7 you’re stuck with yield:

criteria = {
 'eq': lambda v, t1, t2: v == t1,
 'lt': lambda v, t1, t2: v < t1,
 'gt': lambda v, t1, t2: v > t1,
 'le': lambda v, t1, t2: v <= t1,
 'ge': lambda v, t1, t2: v >= t1,
 'gelt': lambda v, t1, t2: t1 <= v < t2,
 'gele': lambda v, t1, t2: t1 <= v <= t2,
}
def filter_library_single(self, field, method, threshold1, threshold2=None):
 meet_criteria = criteria[method]
 for cas in self.Lib:
 if meet_criteria(getattr(cas, field), threshold1, threshold2):
 yield cas

But in python 3 you can use filter for this task:

criteria = {
 'eq': lambda v, t1, t2: v == t1,
 'lt': lambda v, t1, t2: v < t1,
 'gt': lambda v, t1, t2: v > t1,
 'le': lambda v, t1, t2: v <= t1,
 'ge': lambda v, t1, t2: v >= t1,
 'gelt': lambda v, t1, t2: t1 <= v < t2,
 'gele': lambda v, t1, t2: t1 <= v <= t2,
}
def filter_helper(field, method, threshold1, threshold2):
 meet_criteria = criteria[method]
 def wrapped(cas):
 return meet_criteria(getattr(cas, field), threshold1, threshold2)
 return wrapped
def filter_library_single(self, field, method, threshold1, threshold2=None):
 return filter(filter_helper(field, method, threshold1, threshold2), self.Lib)
answered Oct 19, 2015 at 14:13
\$\endgroup\$
1
\$\begingroup\$

I have some issues with your names. If the method is going to have filter in the name, you already have enough context to not put it in all the argument names. You should also be using snake_case for naming functions and variables (that aren't constants), not camelCase.

filter_library_single(self, field, method, value1, value2=None)

About the method name. I don't know the context very well but what does the _single part mean? Nothing in the method shows me, unless there's another method for filtering with multiple operators at once. You could clear this up with a docstring and fix another problem, you don't mention that this filters in place. This seems like unexpected behaviour to me, so it should be noted clearly.

You also don't test for the existence of value2 at all, even though you could easily just add in elif value2 is None. A NameError would be raised, which is misleading. Instead you should pre-emptively raise ValueError with a helpful message. This is how I'd do it:

elif method == 'ge':
 self.Lib = [cas for cas in self.Lib if getattr(cas, field) >= value1]
elif value2 is None:
 raise ValueError("Second value was not supplied.")
elif method == 'gelt':
 self.Lib = [cas for cas in self.Lib if getattr(cas, field) >= value1 and getattr(cas, field) < value2]
answered Oct 19, 2015 at 8:36
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.