When writing Code Review answers, it becomes often necessary to measure how long the modified code takes vs how long the OP's code takes. I needed a nice way to visualize this as a function of the input, so I cobbled together the following script.
The plot_time
function takes a single function (of a single variable atm) and an iterable of inputs, performs multiple timings per input and finally plots it as an errorbar plot using matplotlib
(although usually the errorbars are too small to be seen).
The plot_times
function loops over an iterable of functions and adds some nice labels and a legend (containing the __name__
of each function tested).
from functools import partial
import timeit
import numpy as np
from matplotlib import pyplot
def plot_time(func, inputs, repeats, n_tests):
"""
Run timer and plot time complexity of `func` using the iterable `inputs`.
Run the function `n_tests` times per `repeats`.
"""
x, y, yerr = [], [], []
for i in inputs:
timer = timeit.Timer(partial(func, i))
t = timer.repeat(repeat=repeats, number=n_tests)
x.append(i)
y.append(np.mean(t))
yerr.append(np.std(t) / np.sqrt(len(t)))
pyplot.errorbar(x, y, yerr=yerr, fmt='-o', label=func.__name__)
def plot_times(functions, inputs, repeats=3, n_tests=1, file_name=""):
"""
Run timer and plot time complexity of all `functions`,
using the iterable `inputs`.
Run the functions `n_tests` times per `repeats`.
Adds a legend containing the labels added by `plot_time`.
"""
for func in functions:
plot_time(func, inputs, repeats, n_tests)
pyplot.legend()
pyplot.xlabel("Input")
pyplot.ylabel("Time [s]")
if not file_name:
pyplot.show()
else:
pyplot.savefig(file_name)
if __name__ == "__main__":
import math
import time
scale = 100.
def o_n(n):
time.sleep(n / scale)
def o_n2(n):
time.sleep(n**2 / scale)
def o_log(n):
time.sleep(math.log(n + 1) / scale)
def o_nlog(n):
time.sleep(n * math.log(n + 1) / scale)
def o_exp(n):
time.sleep((math.exp(n) - 1) / scale)
plot_times([o_n, o_n2, o_log, o_nlog, o_exp],
np.linspace(0, 1.1, num=10), repeats=3)
Saving the figure can either be done manually, or by passing a file name.
The test code, that defines some functions for the most common complexity classes (\$\mathcal{O}(n), \mathcal{O}(n^2), \mathcal{O}(\log n), \mathcal{O}(n \log n), \mathcal{O}(\exp n)\$), produces this graphical output:
Plot of scaling behavior of basic complexity classes
Any thoughts or recommendations are welcome. Especially I don't think there is a nice way around (implicitly) using global variables here (when generating a new graph matplotlib
automatically adds it to the current figure).
1 Answer 1
matplotlib
adding stuff to the current figure is because you are not using the OO-interface. It is slightly clunkier, but allows way more freedom
In my view, you are mixing several things up. I would seperate the generation of the timings, the aggregation of these results and the plotting. In this way, if you want to change to another plotting library (bokeh
, seaborn
,...) you only need to adapt or add 1 method, and not refactor the whole thing.
imports
import pandas as pd
from functools import partial
import timeit
import numpy as np
import matplotlib.pyplot as plt
tying it together
def plot_times(functions, inputs, repeats=3, n_tests=1, file_name=""):
timings = get_timings(functions, inputs, repeats=3, n_tests=1)
results = aggregate_results(timings)
fig, ax = plot_results(results)
return fig, ax, results
This should be self-explanatory. first you generate the timings, then you aggregate the results and finally you plot the aggregations. In this way, you can test each part individually. This function returns the actual generated figure, the plot axes and the results-DataFrame so you can inspect them later on if needed. In your __main__
you can save or show the fig
get_timings
def get_timings(functions, inputs, repeats, n_tests):
for func in functions:
result = pd.DataFrame(index = inputs, columns = range(repeats),
data=(timeit.Timer(partial(func, i)).repeat(repeat=repeats, number=n_tests) for i in inputs))
yield func, result
The only change I made here is to save the individual data of one function in a pandas.DataFrame
for easy aggregation afterwards. The yield
makes sure these timings are only calculated when they are needed
aggregate_results
def aggregate_results(timings):
empty_multiindex = pd.MultiIndex(levels=[[],[]], labels=[[],[]], names=['func', 'result'])
aggregated_results = pd.DataFrame(columns=empty_multiindex)
for func, timing in timings:
for measurement in timing:
aggregated_results[func.__name__, measurement] = timing[measurement]
aggregated_results[func.__name__, 'avg'] = timing.mean(axis=1)
aggregated_results[func.__name__, 'yerr'] = timing.std(axis=1)
return aggregated_results
This method makes 1 big DataFrame with a MultiIndex in the columns whose first level is the function
, the second label is the result. Each individual timing is saved in this DataFrame, and the average and standard deviation calculated
plot_results
def plot_results(results):
fig, ax = plt.subplots()
x = results.index
for func in results.columns.levels[0]:
y = results[func, 'avg']
yerr = results[func, 'yerr']
ax.errorbar(x, y, yerr=yerr, fmt='-o', label=func)
ax.set_xlabel('Input')
ax.set_ylabel('Time [s]')
ax.legend()
return fig, ax
This simply generates a plot for every func
, adds the labels and the legend