Iterative equation solver in Python

Question 1

In order to solve a equation where the left hand side appears under an integral on the right hand side:

$$ B(p^2) = C\int_0^{p^2}f_1\left(B(q^2),q^2\right)\mathrm{d}q^2 + C\int_{p^2}^{\Lambda^2} f_2\left(B(q^2),q^2\right)\mathrm{d}q^2 $$

I have written the following code to do solve this equation iteratively:

import numpy as np
import time
def solver(alpha = 2.):
 L2 = 1 # UV Cut-off
 N = 1000 # Number of grid points
 maxIter = 100 # Maximum no. of iterations
 # The grid on which the function is evaluated 
 p2grid = np.logspace(-10, 0, N, base=10)
 dq = np.array([0] + [p2grid[i+1]-p2grid[i] for i in range(len(p2grid)-1)])
 Bgrid = np.ones(N)
 Bgrid_new = np.empty(N) # Buffer variable for new values of B
 C = alpha / (4*np.pi)
 for j in range(maxIter):
 for i, p2 in enumerate(p2grid): 
 # If lower and upper limit of an integral are the same, set to 0 
 if i > 0: 
 n = i + 1
 int1 = np.add.reduce((p2grid[0:n] * 3*Bgrid[0:n] /\
 (p2 * (p2grid[0:n] + Bgrid[0:n]**2))) * dq[0:n])
 else:
 int1 = 0
 if i < N - 1:
 int2 = np.add.reduce((3*Bgrid[i:] /
 (p2grid[i:] + Bgrid[i:]**2)) * dq[i:])
 else:
 int2 = 0 
 # Write new values to buffer variable
 Bgrid_new[i] = C * (int1 + int2)
 # Calculate relative error (take the maximum)
 maxError = np.max(np.abs((Bgrid - Bgrid_new)/Bgrid_new))
 # Copy values from buffer to working variable
 Bgrid = np.copy(Bgrid_new)
 # If the change in the last iteration was small enough, stop
 if (maxError < 10**-4):
 break
 print "Finished in", j+1, "iterations, relative error:", maxError 
 return p2grid, Bgrid/np.sqrt(L2)
t0 = time.clock()
p2grid, Bgrid = solver()
print "Time:", time.clock() - t0, " seconds"

I am wondering if there a speedups possible:

I am using np.add.reduce instead of np.sum, which is about 0.2 secs faster on my system
I tried to move the code where int1 and int2 are calculated to a separate function, no improvement here
Using [... for enumerate(p2grid)] instead of the for-loop seems not be faster either
The profiler says almost all time is still spent in the reduce method.

I can't see how this code can be optimized further (as most time is spent in a function of the NumPy lib), but I can't believe I wrote the most efficient code either.

If there are no optimizations possible at code level, is there anything else I should try or look into, like different interpreters or something like that?

Question 2

The way you wrote your integrals doesn't quite make sense. What's the variable of integration? Is it p? If so, are you using p both as a dummy variable and a limit?

Question 3

Sorry, fixed it!

Question 4

You don't need line continuation characters inside brackets.

After splitting up your reduce lines, line_profiler says most of your time is actually in

(p2grid[0:n] * 3*Bgrid[0:n] / (p2 * (p2grid[0:n] + Bgrid[0:n]**2))) * dq[0:n]

and

(3*Bgrid[i:] / (p2grid[i:] + Bgrid[i:]**2)) * dq[i:]

Extracting those outside of the loop and pulling a division out of the sum:

t2 = dq * 3 * Bgrid / (p2grid + Bgrid**2)
t1 = t2 * p2grid
for i, p2 in enumerate(p2grid): 
 # If lower and upper limit of an integral are the same, set to 0 
 if i > 0: 
 int1 = np.add.reduce(t1[:i+1]) / p2
 else:
 int1 = 0
 if i < N - 1:
 int2 = np.add.reduce(t2[i:])
 else:
 int2 = 0 
 # Write new values to buffer variable
 Bgrid_new[i] = C * (int1 + int2)

gives a good speed boost.

You can also simplify the logic to:

val = 0
if i > 0: 
 val = np.add.reduce(t1[:i+1]) / p2
if i < N - 1:
 val += np.add.reduce(t2[i:])
Bgrid_new[i] = C * val

But then note you have a sliding summation; you can do this beforehand with numpy.add.accumulate.

 t2 = dq * 3 * Bgrid / (p2grid + Bgrid**2)
 t1 = t2 * p2grid
 val1_accumulate = np.add.accumulate(t1)
 val2_accumulate = np.add.accumulate(t2[::-1])[::-1]
 for i, p2 in enumerate(p2grid): 
 # If lower and upper limit of an integral are the same, set to 0 
 val = 0
 if i > 0:
 val = val1_accumulate[i] / p2
 if i < N - 1:
 val += val2_accumulate[i]
 # Write new values to buffer variable
 Bgrid_new[i] = C * val

This is much faster.

Now this loop can probably be vectorized.

Bgrid_new = np.zeros_like(Bgrid)
Bgrid_new[+1:] += val1_accumulate[+1:] / p2grid[+1:]
Bgrid_new[:-1] += val2_accumulate[:-1]
Bgrid_new *= C

All in all, this seems to be a factor of 300-400 faster. You can do other optimizations like

dq = np.zeros_like(p2grid)
dq[1:] = np.diff(p2grid)

and removing the call to copy, but it should already be fast enough.

Then you should focus on cleaning up the code. Spacing is important as are appropriate variable names (snake_case and not-single-letter names). The trick is to make comments redundant:

num_grid_points = 1000 # Number of grid points

Note how the comment can now be removed.

Here's an attempt at a cleaner version. I've also added a couple more speed improvements, because I'm a hopeless addict:

def solver(alpha = 2.):
 uv_cutoff = 1
 num_grid_points = 1000
 max_iterations = 100
 # The grid on which the function is evaluated 
 p2_grid = np.logspace(-10, 0, num_grid_points, base=10)
 dq = np.empty_like(p2_grid)
 dq[0], dq[1:] = 0, np.diff(p2_grid)
 b_grid = np.ones(num_grid_points)
 c = alpha / (4 * np.pi)
 for iteration_num in range(max_iterations):
 t2 = 3 * dq * b_grid / (p2_grid + b_grid*b_grid)
 t1 = t2 * p2_grid
 t1_cumsum = np.add.accumulate(t1)
 t2_cumsum = np.add.accumulate(t2[::-1])[::-1]
 t1_cumsum[0] = t2_cumsum[-1] = 0
 b_grid_new = c * (t1_cumsum / p2_grid + t2_cumsum)
 # Calculate relative error (take the maximum)
 max_error = np.max(np.abs((b_grid - b_grid_new) / b_grid_new))
 b_grid = b_grid_new
 if max_error < 10**-4:
 break
 print "Finished in", iteration_num + 1, "iterations, relative error:", max_error 
 return p2_grid, b_grid / np.sqrt(uv_cutoff)

I'm not really sure how to rename these, though, because I'm not familiar with the maths.

Question 5

This is truly incredible. Did not know about accumulate before. Well, thank you.

Veedrac Veedrac 9,78323 silver badges38 bronze badges · Accepted Answer · 2015-03-04 07:49:59Z

You don't need line continuation characters inside brackets.

After splitting up your reduce lines, line_profiler says most of your time is actually in

(p2grid[0:n] * 3*Bgrid[0:n] / (p2 * (p2grid[0:n] + Bgrid[0:n]**2))) * dq[0:n]

and

(3*Bgrid[i:] / (p2grid[i:] + Bgrid[i:]**2)) * dq[i:]

Extracting those outside of the loop and pulling a division out of the sum:

t2 = dq * 3 * Bgrid / (p2grid + Bgrid**2)
t1 = t2 * p2grid
for i, p2 in enumerate(p2grid): 
 # If lower and upper limit of an integral are the same, set to 0 
 if i > 0: 
 int1 = np.add.reduce(t1[:i+1]) / p2
 else:
 int1 = 0
 if i < N - 1:
 int2 = np.add.reduce(t2[i:])
 else:
 int2 = 0 
 # Write new values to buffer variable
 Bgrid_new[i] = C * (int1 + int2)

gives a good speed boost.

You can also simplify the logic to:

val = 0
if i > 0: 
 val = np.add.reduce(t1[:i+1]) / p2
if i < N - 1:
 val += np.add.reduce(t2[i:])
Bgrid_new[i] = C * val

But then note you have a sliding summation; you can do this beforehand with numpy.add.accumulate.

 t2 = dq * 3 * Bgrid / (p2grid + Bgrid**2)
 t1 = t2 * p2grid
 val1_accumulate = np.add.accumulate(t1)
 val2_accumulate = np.add.accumulate(t2[::-1])[::-1]
 for i, p2 in enumerate(p2grid): 
 # If lower and upper limit of an integral are the same, set to 0 
 val = 0
 if i > 0:
 val = val1_accumulate[i] / p2
 if i < N - 1:
 val += val2_accumulate[i]
 # Write new values to buffer variable
 Bgrid_new[i] = C * val

This is much faster.

Now this loop can probably be vectorized.

Bgrid_new = np.zeros_like(Bgrid)
Bgrid_new[+1:] += val1_accumulate[+1:] / p2grid[+1:]
Bgrid_new[:-1] += val2_accumulate[:-1]
Bgrid_new *= C

All in all, this seems to be a factor of 300-400 faster. You can do other optimizations like

dq = np.zeros_like(p2grid)
dq[1:] = np.diff(p2grid)

and removing the call to copy, but it should already be fast enough.

Then you should focus on cleaning up the code. Spacing is important as are appropriate variable names (snake_case and not-single-letter names). The trick is to make comments redundant:

num_grid_points = 1000 # Number of grid points

Note how the comment can now be removed.

Here's an attempt at a cleaner version. I've also added a couple more speed improvements, because I'm a hopeless addict:

def solver(alpha = 2.):
 uv_cutoff = 1
 num_grid_points = 1000
 max_iterations = 100
 # The grid on which the function is evaluated 
 p2_grid = np.logspace(-10, 0, num_grid_points, base=10)
 dq = np.empty_like(p2_grid)
 dq[0], dq[1:] = 0, np.diff(p2_grid)
 b_grid = np.ones(num_grid_points)
 c = alpha / (4 * np.pi)
 for iteration_num in range(max_iterations):
 t2 = 3 * dq * b_grid / (p2_grid + b_grid*b_grid)
 t1 = t2 * p2_grid
 t1_cumsum = np.add.accumulate(t1)
 t2_cumsum = np.add.accumulate(t2[::-1])[::-1]
 t1_cumsum[0] = t2_cumsum[-1] = 0
 b_grid_new = c * (t1_cumsum / p2_grid + t2_cumsum)
 # Calculate relative error (take the maximum)
 max_error = np.max(np.abs((b_grid - b_grid_new) / b_grid_new))
 b_grid = b_grid_new
 if max_error < 10**-4:
 break
 print "Finished in", iteration_num + 1, "iterations, relative error:", max_error 
 return p2_grid, b_grid / np.sqrt(uv_cutoff)

I'm not really sure how to rename these, though, because I'm not familiar with the maths.

This is truly incredible. Did not know about accumulate before. Well, thank you.

Stack Exchange Network

Iterative equation solver in Python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Iterative equation solver in Python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions