4
\$\begingroup\$

I am teaching myself some coding, and as my first "big" project I tried implementing a Steepest Descent algorithm to minimize the Rosenbrock function:

$$f(x, y) = 100 (y - x^2)^2 + (1 - x)^2$$

The algorithm goes like this: We start with an initial guess \$x_0\$ (vector). We update the guess using the formula

$$x_{k+1} = x_k - alpha (\nabla f(x_k) \cdot \nabla f(x_k))$$

where alpha is to be chosen so that is satisfies the Armijo condition. We keep repeating until we reach a point where the gradient is less than 0.1 in both components.

Could you please tell me any ways in which I could improve my algorithm? In particular, I'm looking to increase its speed. From the current starting point, it takes about 30 seconds to run on my computer (16GM ram, i7 processor).

Remark: The reason I keep using np.array([[1, 2, 3]]) for vectors is so that I can transpose and matrix multiply them at will. I'm not sure if this is a good practice.

# This program uses the Steepest Descent Method to 
# minimize the Rosenbrock function
import numpy as np
# Define the Rosenbrock Function
def f(x_k):
 x, y = x_k[0, 0], x_k[0, 1] 
 return 100 * (y - x**2)**2 + (1 - x)**2
# Gradient of f 
def gradient(x_k):
 x, y = x_k[0, 0], x_k[0, 1] 
 return np.array([[-400*x*(y-x**2)-2*(1-x), 200*(y-x**2)]])
def main():
 # Define the starting guess
 x_k = np.array([[10, 5]])
 # Define counter for number of steps
 numSteps = 0
 # Keep iterating until both components of the gradient are less than 0.1 in absolute value
 while abs((gradient(x_k)[0, 0])) > 0.1 or abs((gradient(x_k))[0, 1]) > 0.1:
 numSteps = numSteps + 1
 # Step direction
 p_k = - gradient(x_k)
 gradTrans = - p_k.T
 # Now we use a backtracking algorithm to find a step length
 alpha = 1.0
 ratio = 0.8
 c = 0.01 # This is just a constant that is used in the algorithm
 # This loop selects an alpha which satisfies the Armijo condition 
 while f(x_k + alpha * p_k) > f(x_k) + (alpha * c * (gradTrans @ p_k))[0, 0]:
 alpha = ratio * alpha
 x_k = x_k + alpha * p_k
 print("The number of steps is: ", numSteps)
 print("The final step is:", x_k)
 print("The gradient is: ", gradient(x_k))
main()
AJNeufeld
35.2k5 gold badges41 silver badges103 bronze badges
asked May 24, 2020 at 22:13
\$\endgroup\$

2 Answers 2

5
\$\begingroup\$

This

# Define the Rosenbrock Function
def f(x_k):
 x, y = x_k[0, 0], x_k[0, 1] 
 return 100 * (y - x**2)**2 + (1 - x)**2

could be

def f_rosenbrock(xy):
 x, y = xy
 return 100 * (y - x**2)**2 + (1 - x)**2

This

# Gradient of f 
def gradient(x_k):
 x, y = x_k[0, 0], x_k[0, 1] 
 return np.array([-400*x*(y-x**2)-2*(1-x), 200*(y-x**2)])

could be

def df_rosenbrock(xy):
 x, y = xy
 return np.array([-400*x*(y-x**2)-2*(1-x), 200*(y-x**2)])

It wouldn't cost much to turn main into a more general gradient descent function having the following signature:

def gradient_descent(f, d_f, x0):
 # Define the starting guess
 x_k = x0
 # ...

You could add the following condition so that this code won't run if imported as a module.

if __name__ == '__main__':
 # main()
 gradient_descent(f_rosenbrock, df_rosenbrock, np.array([10, 5]))

It'd probably be the best to stick to either camelCase or snake_case for variable names. The second is more popular. E.g. num_steps instead of numSteps.

Don't evaluate the gradient so many times:

 while abs((gradient(x_k)[0, 0])) > 0.1 or abs((gradient(x_k))[0, 1]) > 0.1:
 # ...
 p_k = - gradient(x_k)
 gradTrans = - p_k.T
 # ...
 print("The gradient is: ", gradient(x_k))

could be

 while True:
 g_k = df(x_k)
 if np.abs(g_k).max() < tol:
 break 
 # ...
 print("The gradient is: ", g_k)

We don't need gradTrans, nor p_k.

This

 # Now we use a backtracking algorithm to find a step length
 alpha = 1.0
 ratio = 0.8
 c = 0.01 # This is just a constant that is used in the algorithm
 # This loop selects an alpha which satisfies the Armijo condition 
 while f(x_k + alpha * p_k) > f(x_k) + (alpha * c * (gradTrans @ p_k))[0, 0]:
 alpha = ratio * alpha
 x_k = x_k + alpha * p_k

part is probably the worst offender wrt. performance. You don't have to recalculate all of these values. Some constants are hardcoded, while they could easily become parameters.

Anyway, putting it all together we get something like the following. Feel free to add comments to it, but use docstrings whenever appropriate.

import numpy as np
def f_rosenbrock(xy):
 x, y = xy
 return 100 * (y - x**2)**2 + (1 - x)**2
def df_rosenbrock(xy):
 x, y = xy
 return np.array([-400*x*(y-x**2)-2*(1-x), 200*(y-x**2)])
def gradient_descent(f, df, x0, tol=.1, alpha=1.0, ratio=.8, c=.01):
 x_k, num_steps, step_size = x0, 0, alpha
 while True:
 g_k = df(x_k)
 if np.abs(g_k).max() < tol:
 break
 num_steps += 1
 fx, cg = f(x_k), - c * (g_k**2).sum()
 while f(x_k - step_size * g_k) > fx + step_size * cg:
 step_size *= ratio
 x_k -= step_size * g_k
 return x_k, g_k, num_steps
if __name__ == '__main__':
 x, g, n = gradient_descent(
 f_rosenbrock, df_rosenbrock, np.array([10.0, 5.0])
 )
 print("The number of steps is: ", n)
 print("The final step is:", x)
 print("The gradient is: ", g)
answered May 25, 2020 at 20:44
\$\endgroup\$
0
5
\$\begingroup\$

You biggest time waster appears to be this loop:

 while f(x_k + alpha * p_k) > f(x_k) + (alpha * c * (gradTrans @ p_k))[0, 0]:
 alpha = ratio * alpha

f(x_k), c, gradTrans, and p_k are all constant in the loop, so you can compute f(x_k) and c * (gradTrans @ p_k) before the loop and use these computed values in the test expression, instead of recomputing the same values over and over.

 fxk = f(x_k)
 offset = c * (gradTrans @ p_k)
 while f(x_k + alpha * p_k) > fxk + (alpha * offset)[0, 0]:
 alpha = ratio * alpha

Doing so cuts the time roughly in half.

Similarly, gradient(x_k) is computed 3 times here:

while abs((gradient(x_k)[0, 0])) > 0.1 or abs((gradient(x_k))[0, 1]) > 0.1:
 ...
 p_k = - gradient(x_k)

Again, compute once and store the result.


You should probably use vectors instead of matrices:

 x_k = np.array([10., 5.])

Which can be unpacked using tuple assignment:

def f(x_k):
 x, y = x_k 
 return 100 * (y - x**2)**2 + (1 - x)**2

And using 10. and 5. in the above x_k initialization makes the arrays float64 instead of int32, which allows you to use in-place addition operators:

 x_k += alpha * p_k

instead of:

 x_k = x_k + alpha * p_k
answered May 25, 2020 at 20:37
\$\endgroup\$
3
  • 1
    \$\begingroup\$ Argh, you beat me to it. :( \$\endgroup\$ Commented May 25, 2020 at 20:45
  • 1
    \$\begingroup\$ Ya, but you wrote a better answer. \$\endgroup\$ Commented May 25, 2020 at 20:47
  • 1
    \$\begingroup\$ Hardly better, just longer.. \$\endgroup\$ Commented May 25, 2020 at 20:58

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.