Visualize the runtimes of list algorithms

Question 1

I wrote a small script to visualize the runtimes of list algorithms.

The script works by creating a list of lists with sampled lengths from 1 to max_size. Next, I map the given algorithm over each list element and replaced the list with the resulting runtime. The runtimes of each list are plotted against the length of each list to generate a runtime plot.

enter image description here

My questions are:

Does my code deviate from best practices?
Could my code get a performance increase by using co-routines or some other type of concurrency (like asyncio)?

from time import time
from random import randrange 
from concurrent.futures import ThreadPoolExecutor
from functools import partial
from matplotlib import pyplot as plt
def func_timer(func, *params, verbose=False):
 """Takes in a function and some parameters to the function and returns the execution time"""
 start = time()
 func(*params)
 t = time() - start
 if verbose:
 print('function {func_name} took {time} seconds to complete given parameters'.format(
 func_name=func.__name__, time=t))
 return t
def list_generator(max_size, num_samples, n=10, upper_num=100):
 """Generates random integer lists with (sampled) lengths from range 1 to max_size.
 The difference between lengths (length spacing) of each list is delta, where delta is 
 defined as the floor division of max_size and num_samples.
 max_size: int
 The max length of a generated list
 num_samples: int
 The number of lists to generate
 returns:
 lst_of_rand_lsts: list of lists
 """
 def get_rand_list(n):
 """Returns a list of random numbers."""
 return [randrange(upper_num) for x in range(n)] 
 assert max_size > num_samples
 delta = max_size // num_samples # uses the floor function to get difference between each sample.
 lst_lengths = [delta*x for x in range(1, num_samples + 1)] # Shift range by 1 to avoid making an empy list.
 return (get_rand_list(x) for x in lst_lengths)
def runtime_generator(func, lists):
 """Maps func over each list in a list of lists and replaces each list element with the function runtime."""
 partial_func_timer = partial(func_timer, func)
 with ThreadPoolExecutor(max_workers=5) as executor:
 res = executor.map(partial_func_timer, lists)
def rtviz(func, *args, max_size=1000, num_samples=500, viz=True, verbose=True):
 """Takes in a function that receives an iterable as input.
 Returns a plot of the runtimes over iterables of random integers of increasing length.
 func: a function that acts on an iterable"""
 def subplot_generator(lst_lengths, func_times_lst):
 plt.plot(lst_lengths, func_times_lst, 'bo')
 plt.xlabel('length of random lists')
 plt.ylabel('function runtime (sec)')
 plt.title('Runtime of function {}'.format(func.__name__))
 return
 lst_of_lsts = list(list_generator(max_size, num_samples))
 lsts_len = [len(elem) for elem in lst_of_lsts]
 start = time()
 runtime_gen = runtime_generator(func, lst_of_lsts)
 t = time() - start
 if verbose == True:
 print('%s took %0.3fms.' % (func.__name__, t*1000.))
 subplot_generator(lsts_len, list(runtime_gen))
 if viz == True:
 plt.show()
 return
if __name__ == "__main__":
 rtviz(sorted)

Question 2

An even more vital question than does it deviate from best practices, is whether it actually does what you intend for it to do! Does it accurately time exection of the function in you are testing?

I believe not due to the following reasons:

Using time one a single run is at best inaccurate. Any code when run only once is kind of a random indicator as to how long time it'll take to do on the average. The shorter time it takes to execute, the more repeats you should have. You might also run into issues related to which current platform you are on, and imprecision in the time.time() call, as the documentation only promises a precision of a second.
You're setting up the time testing method within the actual time taking. You're timing on the outside of the runtime_generator, and do functools.partial and ThreadPoolExecutor within that function. In the best situations this only skew all of your data, in a more likely situation your method disappears in the time to set up the test run. This effect will be worse the faster your tested function are.

Lets say that test setup takes 100 ms, and your method 1 ms for 1 item, and 10 ms for 1000 ms. Your total time will vary from 101 ms to 110 ms, so you'll likely conclude that it is almost constant time, as 101 ms ~ 110 ms, but the real situation is that is has ten-doubled, 1 ms ~ 10 ms.

Best practice in Python, seems to use the timeit module for timing executions. This module can setup your runs, and execute them multiple times and eliminate some of these caveats.

Although you really gotta keep your head straight even when using this module, as timing execution depends on a lot of sub parameters like: operating system, python interpreter, other load on testing platform, memory issues, code issues, sub effects of memoisations or other programming paradigm, and list goes on.

Question 3

You shouldn't use if verbose == True, just use if verbose as it's neater looking and means the same thing.

Also if you need to make a list based on running the function for each element then you should use map instead of a list comprehension. map takes a function and a list as parameters, returning a new list. So for example, you could change this line:

lsts_len = [len(elem) for elem in lst_of_lsts]

to this:

lsts_len = map(len, lst_of_lsts)

It's faster and more readable.

Why have list_generator return a generator when you immediately turn it into a list?

Also there's no reason to have return at the end of a function like in rtviz. When the end of a function is reached, Python will return None whether or not you have an explicit return function, so you may as well leave it out.

holroy holroy 11.7k1 gold badge27 silver badges59 bronze badges · Accepted Answer · 2015-11-10 18:47:08Z

An even more vital question than does it deviate from best practices, is whether it actually does what you intend for it to do! Does it accurately time exection of the function in you are testing?

I believe not due to the following reasons:

Using time one a single run is at best inaccurate. Any code when run only once is kind of a random indicator as to how long time it'll take to do on the average. The shorter time it takes to execute, the more repeats you should have. You might also run into issues related to which current platform you are on, and imprecision in the time.time() call, as the documentation only promises a precision of a second.
You're setting up the time testing method within the actual time taking. You're timing on the outside of the runtime_generator, and do functools.partial and ThreadPoolExecutor within that function. In the best situations this only skew all of your data, in a more likely situation your method disappears in the time to set up the test run. This effect will be worse the faster your tested function are.

Lets say that test setup takes 100 ms, and your method 1 ms for 1 item, and 10 ms for 1000 ms. Your total time will vary from 101 ms to 110 ms, so you'll likely conclude that it is almost constant time, as 101 ms ~ 110 ms, but the real situation is that is has ten-doubled, 1 ms ~ 10 ms.

Best practice in Python, seems to use the timeit module for timing executions. This module can setup your runs, and execute them multiple times and eliminate some of these caveats.

Although you really gotta keep your head straight even when using this module, as timing execution depends on a lot of sub parameters like: operating system, python interpreter, other load on testing platform, memory issues, code issues, sub effects of memoisations or other programming paradigm, and list goes on.

Stack Exchange Network

Visualize the runtimes of list algorithms

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Visualize the runtimes of list algorithms

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions