Recursive function, high performance critical

Question 1

def uniop_nested(func,o_list):
 def inner(i_list):
 if isinstance(i_list[0],np.ndarray):
 return map(func, i_list)
 else:
 return map(inner, i_list)
 return inner(o_list)
def binop_nested(func, o1, o2):
 if not isinstance(o1,np.ndarray):
 return [binop_nested(func, i1, i2) for (i1,i2) in zip(o1,o2)]
 else:
 return func(o1,o2)
def add_nested(s1,s2):
 return binop_nested(np.add,s1,s2)

My code need to work with lists of ndarrays and list of lists of ndarrays. Profiling shows this is some of the most performance critical code in my project.

How can I optimise it?
Can I rewrite the recursion as a loop?
Is there nice way to rewrite, them as Cython or a C extention? (I have no experience with this)

My Stack Overflow question here indicates that changing datatypes is probably not going to the solution.

More info:

Operands (o1 o2, s1, s2) are short lists. Profiling has shown me that using it.izip is slower.
Function return results are unlikely to be repeated. As the ndarrays are full of floats being tweaked with mathematical operations based of floats. (We are talking a large segment of Rⁿ possible values)
Functions being applied are simple, the add_nested is the most common op by far, but there are a few onthers like uniop_nested(np.zeros_like, o_list).
ndarrays are of different sizes/shapes. (so a multidimentional ndarray won't work)

Context:
This is being used for training Restricted Boltzmann Machines (RBMs) and Neural network.
I have a generic "Trainer" class,
that takes a Trainee class as a parameter.
the Trainee class exposes a few methods like:

Get_update_gradient - a function that returns (for a RBM [Restricted Boltzmann Machine]) a list containing a ndarray of weight changes and 2 ndarrays of bias changes, or (for a multilayer neural net) a list containing a list of weight matrix changes and a list of bias changes
knowledge: a property exposing either a list containing (for a RBM) a 2 bias arrays and a weight matrix or (for a neural net) a list of weight matrixes and bias arrays

It may seem that the Trainer class is simple, and unnesc, however it is moderately complex, and its use is common between the RBM and neural net classes. (Both benefit from the use of momentum and minibatchs) A typical use is:

trainee.knowledge = binop_nested(lambda current_value,update: learning_rate*update+current_value, trainee.knowledge, updates)

Question 2

If you're mostly interested in lists of ndarrays, why not replace recursion with a simple loop without all the 'isinstance' stuff?

Question 3

arager: Because it need to dig down the list (of list) til it gets down to ndArrays

Question 4

Another idea - if you know in advance at which depth the list turns into ndarray, pass that depth as an extra argument. I feel that isinstance is what causing the performance hit. Getting rid of branching would be better though.

Question 5

Does the profiling tell you how much time is spent in this code vs. in the func called from here?

Question 6

still not clear what you are trying to do, can you just give a simple example of what you are trying to do, and what the expected result is? With real functions and a chunk of your real data, since right now it still sounds a bit too fuzzy/abstract/...

Question 7

For recursion, you can try adding an @memoize decorator to it, to speed it up.

import functools
def memoize(f):
 cache= {}
 @functools.wraps(f)
 def memf(*x):
 if x not in cache:
 cache[x] = f(*x)
 return cache[x]
 return memf

Not sure if in your case it will speed up a lot, but if you try with fibonacci, you will see that it vastly speeds up the recursions, since it caches previous results.

This is something that speeds up recursion things in general. For your specific case, we need a bit more info, on what your functions are, and what you want to achieve to get a more specific answer.

Question 8

Won't work as this is not a pure function. It has state. Actually it might be pure. But in anycase the same result is almpst never returned twice.

Question 9

Added more info to question

Question 10

@Oxinabox, not sure what's that about "not a pure function," but the function decorated with this memoize does always return the same output for the same input arguments (which have a limitation that they must be hashable or immutable).

Question 11

Memorisation can only be done on Pure Functions: en.wikipedia.org/wiki/Pure_function In my case the functions may be pure (maybe), but it doesn't matter because I never call the function with the same inputs twice. Memoration is a great technique, for recursive functions, under the right circumstances. Unfortunatly for me, this is not them. (Not saying antyhing is wrong with your answer though, I +1 it)

Question 12

Cython function calls are much faster. (I suspect cython type checking is too).

I recompiled your code with cython, a simple matter of changing they file name to .pyx, adding: cimport cython to the top, and compiling it

Here is a simple test:

a = np.arange(788)
b = np.asarray([0.01]*200)
values = [[a,0.1*a,0.01*a],[a,0.1*a,0.01*a],b,b]
%timeit add_nested(values,values) #Define in this IPython notebook
10000 loops, best of 3: 32 μs per loop
%timeit c_add_nested(values,values) #Define in this IPython notebook with cythonmagic
10000 loops, best of 3: 32 μs per loop
%timeit numpyutil.add_nested(values,values) #In a seperate file called numpyutil.pyx/so
10000 loops, best of 3: 32 μs per loop

That is about a 25% speed increase.

usethedeathstar usethedeathstar 1859 bronze badges · Answer 1 · 2014-03-11 08:58:45Z

2

\$\begingroup\$

For recursion, you can try adding an @memoize decorator to it, to speed it up.

import functools
def memoize(f):
 cache= {}
 @functools.wraps(f)
 def memf(*x):
 if x not in cache:
 cache[x] = f(*x)
 return cache[x]
 return memf

Not sure if in your case it will speed up a lot, but if you try with fibonacci, you will see that it vastly speeds up the recursions, since it caches previous results.

This is something that speeds up recursion things in general. For your specific case, we need a bit more info, on what your functions are, and what you want to achieve to get a more specific answer.

Share

answered Mar 11, 2014 at 8:58

usethedeathstar's user avatar

usethedeathstar usethedeathstar

1859 bronze badges

\$\endgroup\$

4

\$\begingroup\$ Won't work as this is not a pure function. It has state. Actually it might be pure. But in anycase the same result is almpst never returned twice. \$\endgroup\$

Frames Catherine White
– Frames Catherine White

2014年03月11日 09:59:33 +00:00
Commented Mar 11, 2014 at 9:59
\$\begingroup\$ Added more info to question \$\endgroup\$

Frames Catherine White
– Frames Catherine White

2014年03月11日 10:09:14 +00:00
Commented Mar 11, 2014 at 10:09
1

\$\begingroup\$ @Oxinabox, not sure what's that about "not a pure function," but the function decorated with this memoize does always return the same output for the same input arguments (which have a limitation that they must be hashable or immutable). \$\endgroup\$

K3---rnc
– K3---rnc

2014年03月12日 12:02:59 +00:00
Commented Mar 12, 2014 at 12:02
\$\begingroup\$ Memorisation can only be done on Pure Functions: en.wikipedia.org/wiki/Pure_function In my case the functions may be pure (maybe), but it doesn't matter because I never call the function with the same inputs twice. Memoration is a great technique, for recursive functions, under the right circumstances. Unfortunatly for me, this is not them. (Not saying antyhing is wrong with your answer though, I +1 it) \$\endgroup\$

Frames Catherine White
– Frames Catherine White

2014年03月13日 06:03:45 +00:00
Commented Mar 13, 2014 at 6:03

Add a comment |

score 1 · Answer 2 · 2014-03-12 07:33:24Z

Cython function calls are much faster. (I suspect cython type checking is too).

I recompiled your code with cython, a simple matter of changing they file name to .pyx, adding: cimport cython to the top, and compiling it

Here is a simple test:

a = np.arange(788)
b = np.asarray([0.01]*200)
values = [[a,0.1*a,0.01*a],[a,0.1*a,0.01*a],b,b]
%timeit add_nested(values,values) #Define in this IPython notebook
10000 loops, best of 3: 32 μs per loop
%timeit c_add_nested(values,values) #Define in this IPython notebook with cythonmagic
10000 loops, best of 3: 32 μs per loop
%timeit numpyutil.add_nested(values,values) #In a seperate file called numpyutil.pyx/so
10000 loops, best of 3: 32 μs per loop

That is about a 25% speed increase.

Stack Exchange Network

Recursive function, high performance critical

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Recursive function, high performance critical

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions