Implementing a Steepest Descent Algorithm

Question 1

I am teaching myself some coding, and as my first "big" project I tried implementing a Steepest Descent algorithm to minimize the Rosenbrock function:

$$f(x, y) = 100 (y - x^2)^2 + (1 - x)^2$$

The algorithm goes like this: We start with an initial guess \$x_0\$ (vector). We update the guess using the formula

$$x_{k+1} = x_k - alpha (\nabla f(x_k) \cdot \nabla f(x_k))$$

where alpha is to be chosen so that is satisfies the Armijo condition. We keep repeating until we reach a point where the gradient is less than 0.1 in both components.

Could you please tell me any ways in which I could improve my algorithm? In particular, I'm looking to increase its speed. From the current starting point, it takes about 30 seconds to run on my computer (16GM ram, i7 processor).

Remark: The reason I keep using np.array([[1, 2, 3]]) for vectors is so that I can transpose and matrix multiply them at will. I'm not sure if this is a good practice.

# This program uses the Steepest Descent Method to 
# minimize the Rosenbrock function
import numpy as np
# Define the Rosenbrock Function
def f(x_k):
 x, y = x_k[0, 0], x_k[0, 1] 
 return 100 * (y - x**2)**2 + (1 - x)**2
# Gradient of f 
def gradient(x_k):
 x, y = x_k[0, 0], x_k[0, 1] 
 return np.array([[-400*x*(y-x**2)-2*(1-x), 200*(y-x**2)]])
def main():
 # Define the starting guess
 x_k = np.array([[10, 5]])
 # Define counter for number of steps
 numSteps = 0
 # Keep iterating until both components of the gradient are less than 0.1 in absolute value
 while abs((gradient(x_k)[0, 0])) > 0.1 or abs((gradient(x_k))[0, 1]) > 0.1:
 numSteps = numSteps + 1
 # Step direction
 p_k = - gradient(x_k)
 gradTrans = - p_k.T
 # Now we use a backtracking algorithm to find a step length
 alpha = 1.0
 ratio = 0.8
 c = 0.01 # This is just a constant that is used in the algorithm
 # This loop selects an alpha which satisfies the Armijo condition 
 while f(x_k + alpha * p_k) > f(x_k) + (alpha * c * (gradTrans @ p_k))[0, 0]:
 alpha = ratio * alpha
 x_k = x_k + alpha * p_k
 print("The number of steps is: ", numSteps)
 print("The final step is:", x_k)
 print("The gradient is: ", gradient(x_k))
main()

Question 2

This

# Define the Rosenbrock Function
def f(x_k):
 x, y = x_k[0, 0], x_k[0, 1] 
 return 100 * (y - x**2)**2 + (1 - x)**2

could be

def f_rosenbrock(xy):
 x, y = xy
 return 100 * (y - x**2)**2 + (1 - x)**2

This

# Gradient of f 
def gradient(x_k):
 x, y = x_k[0, 0], x_k[0, 1] 
 return np.array([-400*x*(y-x**2)-2*(1-x), 200*(y-x**2)])

could be

def df_rosenbrock(xy):
 x, y = xy
 return np.array([-400*x*(y-x**2)-2*(1-x), 200*(y-x**2)])

It wouldn't cost much to turn main into a more general gradient descent function having the following signature:

def gradient_descent(f, d_f, x0):
 # Define the starting guess
 x_k = x0
 # ...

You could add the following condition so that this code won't run if imported as a module.

if __name__ == '__main__':
 # main()
 gradient_descent(f_rosenbrock, df_rosenbrock, np.array([10, 5]))

It'd probably be the best to stick to either camelCase or snake_case for variable names. The second is more popular. E.g. num_steps instead of numSteps.

Don't evaluate the gradient so many times:

 while abs((gradient(x_k)[0, 0])) > 0.1 or abs((gradient(x_k))[0, 1]) > 0.1:
 # ...
 p_k = - gradient(x_k)
 gradTrans = - p_k.T
 # ...
 print("The gradient is: ", gradient(x_k))

could be

 while True:
 g_k = df(x_k)
 if np.abs(g_k).max() < tol:
 break 
 # ...
 print("The gradient is: ", g_k)

We don't need gradTrans, nor p_k.

This

 # Now we use a backtracking algorithm to find a step length
 alpha = 1.0
 ratio = 0.8
 c = 0.01 # This is just a constant that is used in the algorithm
 # This loop selects an alpha which satisfies the Armijo condition 
 while f(x_k + alpha * p_k) > f(x_k) + (alpha * c * (gradTrans @ p_k))[0, 0]:
 alpha = ratio * alpha
 x_k = x_k + alpha * p_k

part is probably the worst offender wrt. performance. You don't have to recalculate all of these values. Some constants are hardcoded, while they could easily become parameters.

Anyway, putting it all together we get something like the following. Feel free to add comments to it, but use docstrings whenever appropriate.

import numpy as np
def f_rosenbrock(xy):
 x, y = xy
 return 100 * (y - x**2)**2 + (1 - x)**2
def df_rosenbrock(xy):
 x, y = xy
 return np.array([-400*x*(y-x**2)-2*(1-x), 200*(y-x**2)])
def gradient_descent(f, df, x0, tol=.1, alpha=1.0, ratio=.8, c=.01):
 x_k, num_steps, step_size = x0, 0, alpha
 while True:
 g_k = df(x_k)
 if np.abs(g_k).max() < tol:
 break
 num_steps += 1
 fx, cg = f(x_k), - c * (g_k**2).sum()
 while f(x_k - step_size * g_k) > fx + step_size * cg:
 step_size *= ratio
 x_k -= step_size * g_k
 return x_k, g_k, num_steps
if __name__ == '__main__':
 x, g, n = gradient_descent(
 f_rosenbrock, df_rosenbrock, np.array([10.0, 5.0])
 )
 print("The number of steps is: ", n)
 print("The final step is:", x)
 print("The gradient is: ", g)

Question 3

You biggest time waster appears to be this loop:

 while f(x_k + alpha * p_k) > f(x_k) + (alpha * c * (gradTrans @ p_k))[0, 0]:
 alpha = ratio * alpha

f(x_k), c, gradTrans, and p_k are all constant in the loop, so you can compute f(x_k) and c * (gradTrans @ p_k) before the loop and use these computed values in the test expression, instead of recomputing the same values over and over.

 fxk = f(x_k)
 offset = c * (gradTrans @ p_k)
 while f(x_k + alpha * p_k) > fxk + (alpha * offset)[0, 0]:
 alpha = ratio * alpha

Doing so cuts the time roughly in half.

Similarly, gradient(x_k) is computed 3 times here:

while abs((gradient(x_k)[0, 0])) > 0.1 or abs((gradient(x_k))[0, 1]) > 0.1:
 ...
 p_k = - gradient(x_k)

Again, compute once and store the result.

You should probably use vectors instead of matrices:

 x_k = np.array([10., 5.])

Which can be unpacked using tuple assignment:

def f(x_k):
 x, y = x_k 
 return 100 * (y - x**2)**2 + (1 - x)**2

And using 10. and 5. in the above x_k initialization makes the arrays float64 instead of int32, which allows you to use in-place addition operators:

 x_k += alpha * p_k

instead of:

 x_k = x_k + alpha * p_k

Question 4

Argh, you beat me to it. :(

Question 5

Ya, but you wrote a better answer.

Question 6

Hardly better, just longer..

Eman Yalpsid Eman Yalpsid 1,56911 silver badges16 bronze badges · Accepted Answer · 2020-05-25 20:44:48Z

This

# Define the Rosenbrock Function
def f(x_k):
 x, y = x_k[0, 0], x_k[0, 1] 
 return 100 * (y - x**2)**2 + (1 - x)**2

could be

def f_rosenbrock(xy):
 x, y = xy
 return 100 * (y - x**2)**2 + (1 - x)**2

This

# Gradient of f 
def gradient(x_k):
 x, y = x_k[0, 0], x_k[0, 1] 
 return np.array([-400*x*(y-x**2)-2*(1-x), 200*(y-x**2)])

could be

def df_rosenbrock(xy):
 x, y = xy
 return np.array([-400*x*(y-x**2)-2*(1-x), 200*(y-x**2)])

It wouldn't cost much to turn main into a more general gradient descent function having the following signature:

def gradient_descent(f, d_f, x0):
 # Define the starting guess
 x_k = x0
 # ...

You could add the following condition so that this code won't run if imported as a module.

if __name__ == '__main__':
 # main()
 gradient_descent(f_rosenbrock, df_rosenbrock, np.array([10, 5]))

It'd probably be the best to stick to either camelCase or snake_case for variable names. The second is more popular. E.g. num_steps instead of numSteps.

Don't evaluate the gradient so many times:

 while abs((gradient(x_k)[0, 0])) > 0.1 or abs((gradient(x_k))[0, 1]) > 0.1:
 # ...
 p_k = - gradient(x_k)
 gradTrans = - p_k.T
 # ...
 print("The gradient is: ", gradient(x_k))

could be

 while True:
 g_k = df(x_k)
 if np.abs(g_k).max() < tol:
 break 
 # ...
 print("The gradient is: ", g_k)

We don't need gradTrans, nor p_k.

This

 # Now we use a backtracking algorithm to find a step length
 alpha = 1.0
 ratio = 0.8
 c = 0.01 # This is just a constant that is used in the algorithm
 # This loop selects an alpha which satisfies the Armijo condition 
 while f(x_k + alpha * p_k) > f(x_k) + (alpha * c * (gradTrans @ p_k))[0, 0]:
 alpha = ratio * alpha
 x_k = x_k + alpha * p_k

part is probably the worst offender wrt. performance. You don't have to recalculate all of these values. Some constants are hardcoded, while they could easily become parameters.

Anyway, putting it all together we get something like the following. Feel free to add comments to it, but use docstrings whenever appropriate.

import numpy as np
def f_rosenbrock(xy):
 x, y = xy
 return 100 * (y - x**2)**2 + (1 - x)**2
def df_rosenbrock(xy):
 x, y = xy
 return np.array([-400*x*(y-x**2)-2*(1-x), 200*(y-x**2)])
def gradient_descent(f, df, x0, tol=.1, alpha=1.0, ratio=.8, c=.01):
 x_k, num_steps, step_size = x0, 0, alpha
 while True:
 g_k = df(x_k)
 if np.abs(g_k).max() < tol:
 break
 num_steps += 1
 fx, cg = f(x_k), - c * (g_k**2).sum()
 while f(x_k - step_size * g_k) > fx + step_size * cg:
 step_size *= ratio
 x_k -= step_size * g_k
 return x_k, g_k, num_steps
if __name__ == '__main__':
 x, g, n = gradient_descent(
 f_rosenbrock, df_rosenbrock, np.array([10.0, 5.0])
 )
 print("The number of steps is: ", n)
 print("The final step is:", x)
 print("The gradient is: ", g)

Stack Exchange Network

Implementing a Steepest Descent Algorithm

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Implementing a Steepest Descent Algorithm

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions