In order to solve a equation where the left hand side appears under an integral on the right hand side:
$$ B(p^2) = C\int_0^{p^2}f_1\left(B(q^2),q^2\right)\mathrm{d}q^2 + C\int_{p^2}^{\Lambda^2} f_2\left(B(q^2),q^2\right)\mathrm{d}q^2 $$
I have written the following code to do solve this equation iteratively:
import numpy as np
import time
def solver(alpha = 2.):
L2 = 1 # UV Cut-off
N = 1000 # Number of grid points
maxIter = 100 # Maximum no. of iterations
# The grid on which the function is evaluated
p2grid = np.logspace(-10, 0, N, base=10)
dq = np.array([0] + [p2grid[i+1]-p2grid[i] for i in range(len(p2grid)-1)])
Bgrid = np.ones(N)
Bgrid_new = np.empty(N) # Buffer variable for new values of B
C = alpha / (4*np.pi)
for j in range(maxIter):
for i, p2 in enumerate(p2grid):
# If lower and upper limit of an integral are the same, set to 0
if i > 0:
n = i + 1
int1 = np.add.reduce((p2grid[0:n] * 3*Bgrid[0:n] /\
(p2 * (p2grid[0:n] + Bgrid[0:n]**2))) * dq[0:n])
else:
int1 = 0
if i < N - 1:
int2 = np.add.reduce((3*Bgrid[i:] /
(p2grid[i:] + Bgrid[i:]**2)) * dq[i:])
else:
int2 = 0
# Write new values to buffer variable
Bgrid_new[i] = C * (int1 + int2)
# Calculate relative error (take the maximum)
maxError = np.max(np.abs((Bgrid - Bgrid_new)/Bgrid_new))
# Copy values from buffer to working variable
Bgrid = np.copy(Bgrid_new)
# If the change in the last iteration was small enough, stop
if (maxError < 10**-4):
break
print "Finished in", j+1, "iterations, relative error:", maxError
return p2grid, Bgrid/np.sqrt(L2)
t0 = time.clock()
p2grid, Bgrid = solver()
print "Time:", time.clock() - t0, " seconds"
I am wondering if there a speedups possible:
- I am using
np.add.reduce
instead ofnp.sum
, which is about 0.2 secs faster on my system - I tried to move the code where
int1
andint2
are calculated to a separate function, no improvement here - Using
[... for enumerate(p2grid)]
instead of thefor
-loop seems not be faster either - The profiler says almost all time is still spent in the
reduce
method.
I can't see how this code can be optimized further (as most time is spent in a function of the NumPy lib), but I can't believe I wrote the most efficient code either.
If there are no optimizations possible at code level, is there anything else I should try or look into, like different interpreters or something like that?
-
1\$\begingroup\$ The way you wrote your integrals doesn't quite make sense. What's the variable of integration? Is it p? If so, are you using p both as a dummy variable and a limit? \$\endgroup\$200_success– 200_success2015年03月05日 00:02:52 +00:00Commented Mar 5, 2015 at 0:02
-
\$\begingroup\$ Sorry, fixed it! \$\endgroup\$Micha– Micha2015年03月05日 12:04:36 +00:00Commented Mar 5, 2015 at 12:04
1 Answer 1
You don't need line continuation characters inside brackets.
After splitting up your reduce
lines, line_profiler
says most of your time is actually in
(p2grid[0:n] * 3*Bgrid[0:n] / (p2 * (p2grid[0:n] + Bgrid[0:n]**2))) * dq[0:n]
and
(3*Bgrid[i:] / (p2grid[i:] + Bgrid[i:]**2)) * dq[i:]
Extracting those outside of the loop and pulling a division out of the sum:
t2 = dq * 3 * Bgrid / (p2grid + Bgrid**2)
t1 = t2 * p2grid
for i, p2 in enumerate(p2grid):
# If lower and upper limit of an integral are the same, set to 0
if i > 0:
int1 = np.add.reduce(t1[:i+1]) / p2
else:
int1 = 0
if i < N - 1:
int2 = np.add.reduce(t2[i:])
else:
int2 = 0
# Write new values to buffer variable
Bgrid_new[i] = C * (int1 + int2)
gives a good speed boost.
You can also simplify the logic to:
val = 0
if i > 0:
val = np.add.reduce(t1[:i+1]) / p2
if i < N - 1:
val += np.add.reduce(t2[i:])
Bgrid_new[i] = C * val
But then note you have a sliding summation; you can do this beforehand with numpy.add.accumulate
.
t2 = dq * 3 * Bgrid / (p2grid + Bgrid**2)
t1 = t2 * p2grid
val1_accumulate = np.add.accumulate(t1)
val2_accumulate = np.add.accumulate(t2[::-1])[::-1]
for i, p2 in enumerate(p2grid):
# If lower and upper limit of an integral are the same, set to 0
val = 0
if i > 0:
val = val1_accumulate[i] / p2
if i < N - 1:
val += val2_accumulate[i]
# Write new values to buffer variable
Bgrid_new[i] = C * val
This is much faster.
Now this loop can probably be vectorized.
Bgrid_new = np.zeros_like(Bgrid)
Bgrid_new[+1:] += val1_accumulate[+1:] / p2grid[+1:]
Bgrid_new[:-1] += val2_accumulate[:-1]
Bgrid_new *= C
All in all, this seems to be a factor of 300-400 faster. You can do other optimizations like
dq = np.zeros_like(p2grid)
dq[1:] = np.diff(p2grid)
and removing the call to copy
, but it should already be fast enough.
Then you should focus on cleaning up the code. Spacing is important as are appropriate variable names (snake_case
and not-single-letter names). The trick is to make comments redundant:
num_grid_points = 1000 # Number of grid points
Note how the comment can now be removed.
Here's an attempt at a cleaner version. I've also added a couple more speed improvements, because I'm a hopeless addict:
def solver(alpha = 2.):
uv_cutoff = 1
num_grid_points = 1000
max_iterations = 100
# The grid on which the function is evaluated
p2_grid = np.logspace(-10, 0, num_grid_points, base=10)
dq = np.empty_like(p2_grid)
dq[0], dq[1:] = 0, np.diff(p2_grid)
b_grid = np.ones(num_grid_points)
c = alpha / (4 * np.pi)
for iteration_num in range(max_iterations):
t2 = 3 * dq * b_grid / (p2_grid + b_grid*b_grid)
t1 = t2 * p2_grid
t1_cumsum = np.add.accumulate(t1)
t2_cumsum = np.add.accumulate(t2[::-1])[::-1]
t1_cumsum[0] = t2_cumsum[-1] = 0
b_grid_new = c * (t1_cumsum / p2_grid + t2_cumsum)
# Calculate relative error (take the maximum)
max_error = np.max(np.abs((b_grid - b_grid_new) / b_grid_new))
b_grid = b_grid_new
if max_error < 10**-4:
break
print "Finished in", iteration_num + 1, "iterations, relative error:", max_error
return p2_grid, b_grid / np.sqrt(uv_cutoff)
I'm not really sure how to rename these, though, because I'm not familiar with the maths.
-
\$\begingroup\$ This is truly incredible. Did not know about
accumulate
before. Well, thank you. \$\endgroup\$Micha– Micha2015年03月04日 14:31:07 +00:00Commented Mar 4, 2015 at 14:31
Explore related questions
See similar questions with these tags.