Plot timings for a range of inputs

Question 1

When writing Code Review answers, it becomes often necessary to measure how long the modified code takes vs how long the OP's code takes. I needed a nice way to visualize this as a function of the input, so I cobbled together the following script.

The plot_time function takes a single function (of a single variable atm) and an iterable of inputs, performs multiple timings per input and finally plots it as an errorbar plot using matplotlib (although usually the errorbars are too small to be seen).

The plot_times function loops over an iterable of functions and adds some nice labels and a legend (containing the __name__ of each function tested).

from functools import partial
import timeit
import numpy as np
from matplotlib import pyplot
def plot_time(func, inputs, repeats, n_tests):
 """
 Run timer and plot time complexity of `func` using the iterable `inputs`.
 Run the function `n_tests` times per `repeats`.
 """
 x, y, yerr = [], [], []
 for i in inputs:
 timer = timeit.Timer(partial(func, i))
 t = timer.repeat(repeat=repeats, number=n_tests)
 x.append(i)
 y.append(np.mean(t))
 yerr.append(np.std(t) / np.sqrt(len(t)))
 pyplot.errorbar(x, y, yerr=yerr, fmt='-o', label=func.__name__)
def plot_times(functions, inputs, repeats=3, n_tests=1, file_name=""):
 """
 Run timer and plot time complexity of all `functions`,
 using the iterable `inputs`.
 Run the functions `n_tests` times per `repeats`.
 Adds a legend containing the labels added by `plot_time`.
 """
 for func in functions:
 plot_time(func, inputs, repeats, n_tests)
 pyplot.legend()
 pyplot.xlabel("Input")
 pyplot.ylabel("Time [s]")
 if not file_name:
 pyplot.show()
 else:
 pyplot.savefig(file_name)
if __name__ == "__main__":
 import math
 import time
 scale = 100.
 def o_n(n):
 time.sleep(n / scale)
 def o_n2(n):
 time.sleep(n**2 / scale)
 def o_log(n):
 time.sleep(math.log(n + 1) / scale)
 def o_nlog(n):
 time.sleep(n * math.log(n + 1) / scale)
 def o_exp(n):
 time.sleep((math.exp(n) - 1) / scale)
 plot_times([o_n, o_n2, o_log, o_nlog, o_exp],
 np.linspace(0, 1.1, num=10), repeats=3)

Saving the figure can either be done manually, or by passing a file name.

The test code, that defines some functions for the most common complexity classes (\$\mathcal{O}(n), \mathcal{O}(n^2), \mathcal{O}(\log n), \mathcal{O}(n \log n), \mathcal{O}(\exp n)\$), produces this graphical output:

Plot of scaling behavior of basic complexity classes

Any thoughts or recommendations are welcome. Especially I don't think there is a nice way around (implicitly) using global variables here (when generating a new graph matplotlib automatically adds it to the current figure).

Question 2

matplotlib adding stuff to the current figure is because you are not using the OO-interface. It is slightly clunkier, but allows way more freedom

In my view, you are mixing several things up. I would seperate the generation of the timings, the aggregation of these results and the plotting. In this way, if you want to change to another plotting library (bokeh, seaborn,...) you only need to adapt or add 1 method, and not refactor the whole thing.

imports

import pandas as pd
from functools import partial
import timeit
import numpy as np
import matplotlib.pyplot as plt

tying it together

def plot_times(functions, inputs, repeats=3, n_tests=1, file_name=""):
 timings = get_timings(functions, inputs, repeats=3, n_tests=1)
 results = aggregate_results(timings)
 fig, ax = plot_results(results)
 return fig, ax, results

This should be self-explanatory. first you generate the timings, then you aggregate the results and finally you plot the aggregations. In this way, you can test each part individually. This function returns the actual generated figure, the plot axes and the results-DataFrame so you can inspect them later on if needed. In your __main__ you can save or show the fig

get_timings

def get_timings(functions, inputs, repeats, n_tests):
 for func in functions:
 result = pd.DataFrame(index = inputs, columns = range(repeats), 
 data=(timeit.Timer(partial(func, i)).repeat(repeat=repeats, number=n_tests) for i in inputs))
 yield func, result

The only change I made here is to save the individual data of one function in a pandas.DataFrame for easy aggregation afterwards. The yield makes sure these timings are only calculated when they are needed

aggregate_results

def aggregate_results(timings):
 empty_multiindex = pd.MultiIndex(levels=[[],[]], labels=[[],[]], names=['func', 'result'])
 aggregated_results = pd.DataFrame(columns=empty_multiindex)
 for func, timing in timings:
 for measurement in timing:
 aggregated_results[func.__name__, measurement] = timing[measurement]
 aggregated_results[func.__name__, 'avg'] = timing.mean(axis=1)
 aggregated_results[func.__name__, 'yerr'] = timing.std(axis=1)
 return aggregated_results

This method makes 1 big DataFrame with a MultiIndex in the columns whose first level is the function, the second label is the result. Each individual timing is saved in this DataFrame, and the average and standard deviation calculated

plot_results

def plot_results(results):
 fig, ax = plt.subplots()
 x = results.index
 for func in results.columns.levels[0]:
 y = results[func, 'avg']
 yerr = results[func, 'yerr'] 
 ax.errorbar(x, y, yerr=yerr, fmt='-o', label=func)
 ax.set_xlabel('Input')
 ax.set_ylabel('Time [s]')
 ax.legend() 
 return fig, ax

This simply generates a plot for every func, adds the labels and the legend

Maarten Fabré Maarten Fabré 9,3901 gold badge15 silver badges27 bronze badges · Accepted Answer · 2017-06-09 13:13:16Z

matplotlib adding stuff to the current figure is because you are not using the OO-interface. It is slightly clunkier, but allows way more freedom

In my view, you are mixing several things up. I would seperate the generation of the timings, the aggregation of these results and the plotting. In this way, if you want to change to another plotting library (bokeh, seaborn,...) you only need to adapt or add 1 method, and not refactor the whole thing.

imports

import pandas as pd
from functools import partial
import timeit
import numpy as np
import matplotlib.pyplot as plt

tying it together

def plot_times(functions, inputs, repeats=3, n_tests=1, file_name=""):
 timings = get_timings(functions, inputs, repeats=3, n_tests=1)
 results = aggregate_results(timings)
 fig, ax = plot_results(results)
 return fig, ax, results

This should be self-explanatory. first you generate the timings, then you aggregate the results and finally you plot the aggregations. In this way, you can test each part individually. This function returns the actual generated figure, the plot axes and the results-DataFrame so you can inspect them later on if needed. In your __main__ you can save or show the fig

get_timings

def get_timings(functions, inputs, repeats, n_tests):
 for func in functions:
 result = pd.DataFrame(index = inputs, columns = range(repeats), 
 data=(timeit.Timer(partial(func, i)).repeat(repeat=repeats, number=n_tests) for i in inputs))
 yield func, result

The only change I made here is to save the individual data of one function in a pandas.DataFrame for easy aggregation afterwards. The yield makes sure these timings are only calculated when they are needed

aggregate_results

def aggregate_results(timings):
 empty_multiindex = pd.MultiIndex(levels=[[],[]], labels=[[],[]], names=['func', 'result'])
 aggregated_results = pd.DataFrame(columns=empty_multiindex)
 for func, timing in timings:
 for measurement in timing:
 aggregated_results[func.__name__, measurement] = timing[measurement]
 aggregated_results[func.__name__, 'avg'] = timing.mean(axis=1)
 aggregated_results[func.__name__, 'yerr'] = timing.std(axis=1)
 return aggregated_results

This method makes 1 big DataFrame with a MultiIndex in the columns whose first level is the function, the second label is the result. Each individual timing is saved in this DataFrame, and the average and standard deviation calculated

plot_results

def plot_results(results):
 fig, ax = plt.subplots()
 x = results.index
 for func in results.columns.levels[0]:
 y = results[func, 'avg']
 yerr = results[func, 'yerr'] 
 ax.errorbar(x, y, yerr=yerr, fmt='-o', label=func)
 ax.set_xlabel('Input')
 ax.set_ylabel('Time [s]')
 ax.legend() 
 return fig, ax

This simply generates a plot for every func, adds the labels and the legend

Stack Exchange Network

Plot timings for a range of inputs

1 Answer 1

imports

tying it together

get_timings

aggregate_results

plot_results

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Plot timings for a range of inputs

1 Answer 1

imports

tying it together

get_timings

aggregate_results

plot_results

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions