Reducing function call overhead in python

Question 1

I developed an application which simulates N robots moving in a grid which try to maximize the amount of visited grid cells in a limited amount of steps, meeting in a goal point. It all works correctly, but is horrible slow. It is currently python+numpy+mathplotlib.

Maximum robots there can be have a soft limit of 100 (if it can get higher, it is nice to have).

To do that, I do the following, simplified:

while steps > 0:
 for robot in robots:
 agent.calc(robot,steps)

A robot is a 1x2 numpy array (x-and-y-coordinates).

The agent here decides what to do. Since I need to switch the tactic and strategy on the fly, I cannot move that logic.

agent.calc updates a robot in place, one after another.

cProfiling it returns the following. Extracting the top

 39014272 function calls (39010490 primitive calls) in 150.314 seconds
 Ordered by: internal time
 ncalls tottime percall cumtime percall filename:lineno(function)
 12417735 62.807 0.000 62.807 0.000 distance.py:8(taxicab_distance)
 124596 36.882 0.000 36.882 0.000 {numpy.core.multiarray.array}
 113657 30.204 0.000 100.800 0.001 logical_agent.py:16(choose_max_distance_to...)
 12417013 6.579 0.000 69.384 0.000 squaregrid.py:30(distance)
 113700 2.900 0.000 109.769 0.001 logical_agent.py:73(calc)
 11652363 2.625 0.000 2.625 0.000 {method 'append' of 'list' objects}
 161849 1.653 0.000 1.653 0.000 distance.py:11(euclidean_distance)
 113664 1.632 0.000 1.632 0.000 {sorted}
 114834 1.185 0.000 1.185 0.000 {method 'keys' of 'dict' objects}
 113700 0.695 0.000 1.134 0.000 squaregrid.py:19(neighbours)

I implemented different environments for the robots, the most important is squaregird. Every environment has its own distance function, since I intended to use different metrics, i.e. Manhattan/taxicab and euclidean. I extracted the distance function into an own distance.py file, since I use it in several occasions.

One can see that taxicab_distance is called alot, since the agent needs to evaluate the distances of a robots four neighbours and itself to a goal point to see whether the next position can still reach the goal and to maximize the distance to all other robots as a optimizing heuristics.

The function does not do anything fancy, just

def taxicab_distance(u, v):
 return np.abs(u[0] - v[0]) + np.abs(u[1] - v[1])

I know that python has a pretty high function call overhead, and I assume that that hits the performance. The {numpy.core.multiarray.array} can be ignored, I think I know what I am doing wrong there.

Distance call chain: agent -> environment.distance -> taxicab_distance

The question is, how can I reduce the overhead of calling the function? I strongly considered using pythons c extensibility, cython, to be more concrete. Will it work? can there be another reason why it is so slow?

Question 2

I doing very much that the overhead of calling the function is significant, seeing as it is almost certainly dominated by the actual work done by the function.

Question 3

As an experiment, temporarily stop using taxicab_distance. Instead, execute function's code directly: np.abs(u[0] - v[0]) + np.abs(u[1] - v[1]). Does it help the performance in an appreciable way? My guess is that you should be hunting big (a better algorithm, caching strategies, etc), not small (micro-optimizations like function call overhead).

Question 4

Are you taking advantage of vectorized operations? It doesn't look like it. If you don't, using NumPy will actually be slower than using regular lists.

Question 5

method 'keys' of 'dict' objects - this is unlikely to have a major impact on your runtime, but why are you calling that method? Most of the time, it's unnecessary. If you want to iterate over a dict's keys, just use for key in d.

Question 6

@user2357112 I dont need to iterate over the dict keys, I store them to animate them later. For every step, I snapshot the grid cells and robot positions for matplotlib animate.

Question 7

First, I'd rewrite it into:

def taxicab_distance(u, v):
 return np.sum(np.abs(u - v))

Can you compute taxicab_distance for many robots at once?

Question 8

exactly. to take advantage of the performance benefits of numpy, you need to vectorize your code; which lifts the inner loops from python to the C-level.

Question 9

How would I vectorize my agent.calc(robot,steps) ? The underlying data structure is an array of robots(2d-points), and I potentially could return the value of the call instead of updating in place.

Question 10

You need to show us what agent.calc does. I suggest creating a question a la "How to vectorize ..."

Question 11

I benchmarked it with inlining, and it took off ~15sec. In the end, I rewrote the number crunching in C++ and used cython for integrating. After that, it took only 1 second.

EDIT: cpython -> cython

Question 12

do you really mean 'cpython', or something else?

Question 13

I meant cython to call C++ from Python.

Question 14

You gave yourself an XY problem; numpy is designed to work with aggregate data because of the exceptionally high overhead python has for making function calls, see stackoverflow.com/questions/1401712/…. You could have used np.sum(np.abs(np.subtract(robot, neighbors)), axis=1), and perhaps 'neighbors' would be the list of all robots, for cache coherence.

Yariv 13.4k21 gold badges57 silver badges76 bronze badges · Accepted Answer · 2014-01-01 15:02:15Z

5

First, I'd rewrite it into:

def taxicab_distance(u, v):
 return np.sum(np.abs(u - v))

Can you compute taxicab_distance for many robots at once?

Share

Improve this answer

answered Jan 1, 2014 at 15:02

Yariv's user avatar

Yariv

13.4k21 gold badges57 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Eelco Hoogendoorn

Eelco Hoogendoorn Over a year ago

exactly. to take advantage of the performance benefits of numpy, you need to vectorize your code; which lifts the inner loops from python to the C-level.

2014年01月01日T16:49:32.623Z+00:00

jcklie

jcklie Over a year ago

How would I vectorize my agent.calc(robot,steps) ? The underlying data structure is an array of robots(2d-points), and I potentially could return the value of the call instead of updating in place.

2014年01月01日T17:21:50.04Z+00:00

Yariv

Yariv Over a year ago

You need to show us what agent.calc does. I suggest creating a question a la "How to vectorize ..."

2014年01月01日T19:31:10.157Z+00:00

CollectivesTM on Stack Overflow

Reducing function call overhead in python

2 Answers 2

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related