Epidemic simulation

Question 1

The purpose of this code is to simulate epidemics across a population. There are 625 (pop) individuals at random locations. The epidemic parameters are infectious period (inf_period), trans (transmissibility of the disease - essentially virulence), susc (the susceptibility of each individual to the disease), and eps (epislon, the probability of an individual becoming infectious randomly, not due to contact with an infectious person). The argument 'reps' is the number of times to simulate one set of epidemic parameters - that is, one set of [susc, trans, inf_period, eps].

In this example, there are 24 possible combinations of parameter values, and we want 400 reps per combination, so 24*400 = 9600 runs. Those values cannot change. To make this code faster, how can the number of loops be reduced (I've heard those are slow)?

This has many loops and if statements, and to run the full version will take roughly 2.5 days. How can it be made more efficient in terms of time? I know that may be vague, so if there's a way I can clarify please let me know! I should also mention I have access to a GPU.

import numpy as np
from scipy import spatial
import json
def fun(susc, trans, inf_period, eps, reps, pop):
 epi_list = []
 count_list = []
 new_susc = []
 new_trans = []
 new_inf_period = []
 new_eps = []
 count = 0
 epi_file = "file1.json"
 count_file = "file2.json"
 with open(epi_file, 'w') as f, open(count_file, 'w') as h:
 for i in range(len(trans)):
 for j in inf_period:
 for k in eps:
 should_restart = True
 while should_restart:
 should_restart = False
 broken = False
 count_2 = 0
 for rep in reps:
 failcount = 0
 g1 = external_function_call(pop, susc, trans[i], j, k, full_mat)
 while(len(g1.keys()) < 10 or np.max(g1.values()) < 10): 
 failcount += 1
 if failcount > 50:
 trans[i] += 1
 broken = True
 break
 g1 = external_function_call(pop, susc, trans[i], j, k, full_mat) #run again with new i, rep times
 if not broken:
 g2 = inf_per_count_time(g1)
 count += 1
 epi_list.append(g1) #if the first epi in the reps works, but the subsequent ones do not, still writes. Bad!
 count_list.append(g2)
 new_susc.append(susc)
 new_trans.append(trans[i])
 new_inf_period.append(j)
 new_eps.append(k)
 else: #start from rep
 should_restart = True
 if rep > 0: #if we've already written an epidemic using this set of parameters
 for i in range(rep-1, -1, -1): 
 del epi_list[i]
 del count_list[i]
 del new_susc[i]
 del new_trans[i]
 del new_inf_period[i]
 del new_eps[i]
 count -=1
 break
 else:
 break
 paras = np.array([np.asarray(new_susc), np.asarray(new_trans), np.asarray(new_inf_period), np.asarray(new_eps)]).T
 print 'number of parameter rows', paras[:,0].shape
 with open('parameters.txt', 'w') as newfile1:
 np.savetxt(newfile1, paras, fmt = ['%f', '%f', '%f', '%f'])
 print count
if __name__ == "__main__":
 pop = 625
 susc = 0.3
 trans = [1.5, 2.5, 3]
 inf_period = [2, 3]
 eps = [0, 0.01, 0.02, 0.05]
 reps = np.arange(400) 
 fun(susc, trans, inf_period, eps, reps, pop)

Question 2

You seems to have stripped out important pieces of code. We may help you simplify these loops but not knowing what external_function_call or inf_per_count_time are, it's a bit pointless for the performance aspect. You also open two files (f and h) that you seem to never use in your loops, is it on purpose?

Question 3

Is the mix of four and eight space tabs how your code is in your IDE? If not, when you update your code to include Mathias' request, paste the code in select it all and press ctrl-k, or prepend 4 spaces to each line in your editor of choice.

Question 4

78 characters of indentation at its deepest: this code is unreadable. We can't easily match the core of the code with the definition of the parameters.

To improve that, you can:

use 4 space per indentation level instead of 8 as recommended per PEP 8;
use itertools.product to iterate over all the combinations of parameters in one single loop instead of 3;
remove unused variable declaration such as your opens;
use the break ... else construct that can be applied to any loop, this will save you the use of the broken flag;
use slice deletion rather than deleting items one by one in a for loop (plus it will be more efficient).

This lead to the more readable:

import numpy as np
from scipy import spatial
import json
import itertools
def fun(susc, trans, inf_period, eps, reps, pop): 
 epi_list = []
 count_list = []
 new_susc = []
 new_trans = []
 new_inf_period = []
 new_eps = []
 count = 0
 for i, j, k in itertools.product(range(len(trans)), inf_period, eps):
 should_restart = True
 while should_restart:
 should_restart = False
 for rep in reps:
 failcount = 0
 g1 = external_function_call(pop, susc, trans[i], j, k, full_mat)
 while(len(g1.keys()) < 10 or np.max(g1.values()) < 10): 
 failcount += 1
 if failcount > 50:
 trans[i] += 1
 break
 g1 = external_function_call(pop, susc, trans[i], j, k, full_mat) #run again with new i, rep times
 else:
 g2 = inf_per_count_time(g1)
 count += 1
 epi_list.append(g1) #if the first epi in the reps works, but the subsequent ones do not, still writes. Bad!
 count_list.append(g2)
 new_susc.append(susc)
 new_trans.append(trans[i])
 new_inf_period.append(j)
 new_eps.append(k)
 continue
 # Cleanup because we failed too many times
 should_restart = True # restart from rep
 deletion_range = slice(0, rep, 1)
 del epi_list[deletion_range]
 del count_list[deletion_range]
 del new_susc[deletion_range]
 del new_trans[deletion_range]
 del new_inf_period[deletion_range]
 del new_eps[deletion_range]
 if rep > 0: #if we've already written an epidemic using this set of parameters
 count -=1
 break
 paras = np.array([np.asarray(new_susc), np.asarray(new_trans), np.asarray(new_inf_period), np.asarray(new_eps)]).T
 print 'number of parameter rows', paras[:,0].shape
 with open('parameters.txt', 'w') as newfile1:
 np.savetxt(newfile1, paras, fmt = ['%f', '%f', '%f', '%f'])
 print count
if __name__ == "__main__":
 pop = 625
 susc = 0.3
 trans = [1.5, 2.5, 3]
 inf_period = [2, 3]
 eps = [0, 0.01, 0.02, 0.05]
 reps = np.arange(400) 
 fun(susc, trans, inf_period, eps, reps, pop)

Now we can start thinking a bit about the code.

First off, you don't need to write the call to external_function_call twice, especially with the same set of parameters. It is more idiomatic to use a while True: <call> if <condition>: break rather than <call> while <condition>: <call>. This also let you handle the successful case within that if rather than with your broken flag.

In this test, you can also take the len of g1 directly, it's equivalent to using len(g1.keys()). And since g1 seems to be a regular Python dictionnary, there is no need in involving numpy there, Python already has a max builtin.

The fail count could also be better handled with a for loop and a named constant:

import numpy as np
from scipy import spatial
import json
import itertools
MAX_FAILED_ATTEMPS = 50
def fun(susc, trans, inf_period, eps, reps, pop): 
 epi_list = []
 count_list = []
 new_susc = []
 new_trans = []
 new_inf_period = []
 new_eps = []
 count = 0
 for i, j, k in itertools.product(range(len(trans)), inf_period, eps):
 should_restart = True
 while should_restart:
 should_restart = False
 for rep in reps:
 for _ in range(MAX_FAILED_ATTEMPS):
 g1 = external_function_call(pop, susc, trans[i], j, k, full_mat)
 if len(g1) >= 10 and max(g1.values()) >= 10:
 g2 = inf_per_count_time(g1)
 count += 1
 epi_list.append(g1) #if the first epi in the reps works, but the subsequent ones do not, still writes. Bad!
 count_list.append(g2)
 new_susc.append(susc)
 new_trans.append(trans[i])
 new_inf_period.append(j)
 new_eps.append(k)
 break
 else:
 trans[i] += 1
 # Cleanup because we failed too many times
 should_restart = True # restart from rep
 deletion_range = slice(0, rep, 1)
 del epi_list[deletion_range]
 del count_list[deletion_range]
 del new_susc[deletion_range]
 del new_trans[deletion_range]
 del new_inf_period[deletion_range]
 del new_eps[deletion_range]
 if rep > 0: #if we've already written an epidemic using this set of parameters
 count -=1
 break
 paras = np.array([np.asarray(new_susc), np.asarray(new_trans), np.asarray(new_inf_period), np.asarray(new_eps)]).T
 print 'number of parameter rows', paras[:,0].shape
 with open('parameters.txt', 'w') as newfile1:
 np.savetxt(newfile1, paras, fmt = ['%f', '%f', '%f', '%f'])
 print count
if __name__ == "__main__":
 pop = 625
 susc = 0.3
 trans = [1.5, 2.5, 3]
 inf_period = [2, 3]
 eps = [0, 0.01, 0.02, 0.05]
 reps = np.arange(400) 
 fun(susc, trans, inf_period, eps, reps, pop)

Now looking at this rewrite and the comment that was associated to the second call to external_function_call, it seems unlikely that this loop is doing any good. No parameters are updated between the various calls. So If the call fail once, it will fail 50 times... slowing down the whole thing unnecessarily... Unless you meant to trans[i] += 1 before each new call; or if external_function_call relly on some form of randomness.

An other thing that bothers me in your code, is how fragile the code is when handling the number of repetitions (reps). You seem to relly on it always starting at 0. But as it is written, I can pass any range, like range(5000, 5801, 2) to get 400 repetitions, not necessarily something that will start at 0.

Most importantly, you could have had some combinations of parameters that ran for each rep, say the first 2, so you will already have 800 results in your arrays. But all of a sudden, the third set of parameter fail 50 times at rep = 40. So you are deleting elements 39 down to 0 in your array... Wait, what? Why? These are results of previous sets of parameters, they are deemed valid, why on earth should we delete them and keep the 40 last results that we know should be restarted from rep = 0?

In the same vein, I don't understand why you count -= 1 if rep is over 0, instead of count -= rep every time.

import numpy as np
from scipy import spatial
import json
import itertools
MAX_FAILED_ATTEMPS = 50
def fun(susc, trans, inf_period, eps, repetitions, pop): 
 epi_list = []
 count_list = []
 new_susc = []
 new_trans = []
 new_inf_period = []
 new_eps = []
 count = 0
 for i, j, k in itertools.product(range(len(trans)), inf_period, eps):
 while True:
 for rep in range(repetitions):
 for _ in range(MAX_FAILED_ATTEMPS):
 g1 = external_function_call(pop, susc, trans[i], j, k, full_mat)
 if len(g1) >= 10 and max(g1.values()) >= 10:
 g2 = inf_per_count_time(g1)
 count += 1
 epi_list.append(g1) #if the first epi in the reps works, but the subsequent ones do not, still writes. Bad!
 count_list.append(g2)
 new_susc.append(susc)
 new_trans.append(trans[i])
 new_inf_period.append(j)
 new_eps.append(k)
 break
 else:
 trans[i] += 1
 # Cleanup because we failed too many times
 del epi_list[-rep:]
 del count_list[-rep:]
 del new_susc[-rep:]
 del new_trans[-rep:]
 del new_inf_period[-rep:]
 del new_eps[-rep:]
 if rep > 0: #if we've already written an epidemic using this set of parameters
 count -=1
 break
 else:
 break # do not restart if we made it through the whole repetitions
 paras = np.array([np.asarray(new_susc), np.asarray(new_trans), np.asarray(new_inf_period), np.asarray(new_eps)]).T
 print 'number of parameter rows', paras[:,0].shape
 with open('parameters.txt', 'w') as newfile1:
 np.savetxt(newfile1, paras, fmt = ['%f', '%f', '%f', '%f'])
 print count
if __name__ == "__main__":
 pop = 625
 susc = 0.3
 trans = [1.5, 2.5, 3]
 inf_period = [2, 3]
 eps = [0, 0.01, 0.02, 0.05]
 fun(susc, trans, inf_period, eps, 400, pop)

And one last note, I am not entirely sure that modifying trans[i] in place is a good idea, as it affect not only this set of parameters but also every further combinations using this particular value. Instead, I would only increment a local copy.

Oh, and make something for these meaningless, one-letter, variable names:

import numpy as np
from scipy import spatial
import json
import itertools
MAX_FAILED_ATTEMPS = 50
def fun(susc, trans, inf_period, eps, repetitions, pop): 
 epi_list = []
 count_list = []
 new_susc = []
 new_trans = []
 new_inf_period = []
 new_eps = []
 count = 0
 parameters_product = itertools.product(trans, inf_period, eps)
 for transmissibility, infectious_period, epsilon in parameters_product:
 while True:
 for rep in range(repetitions):
 for _ in range(MAX_FAILED_ATTEMPS):
 g1 = external_function_call(
 pop, susc, transmissibility,
 infectious_period, epsilon, full_mat)
 if len(g1) >= 10 and max(g1.values()) >= 10:
 g2 = inf_per_count_time(g1)
 count += 1
 epi_list.append(g1)
 count_list.append(g2)
 new_susc.append(susc)
 new_trans.append(transmissibility)
 new_inf_period.append(infectious_period)
 new_eps.append(epsilon)
 break
 else:
 transmissibility += 1
 # Cleanup because we failed too many times
 del epi_list[-rep:]
 del count_list[-rep:]
 del new_susc[-rep:]
 del new_trans[-rep:]
 del new_inf_period[-rep:]
 del new_eps[-rep:]
 if rep > 0:
 # if we've already written an epidemic
 # using this set of parameters
 count -=1
 break
 else:
 # do not restart if we made it through the whole repetitions
 break
 paras = np.array([
 np.asarray(new_susc),
 np.asarray(new_trans),
 np.asarray(new_inf_period),
 np.asarray(new_eps)
 ]).T
 print 'number of parameter rows', paras[:,0].shape
 with open('parameters.txt', 'w') as newfile1:
 np.savetxt(newfile1, paras, fmt = ['%f', '%f', '%f', '%f'])
 print count
if __name__ == "__main__":
 pop = 625
 susc = 0.3
 trans = [1.5, 2.5, 3]
 inf_period = [2, 3]
 eps = [0, 0.01, 0.02, 0.05]
 fun(susc, trans, inf_period, eps, 400, pop)

Question 5

Very nice answer, +1 :) If you'd not thought of it you can merge all the lists into one epidemic list. I personally would use a named tuple as the value. But it allows you to change paras to [[e.susc, e.trans, e.inf_period, e.eps] for e in epidemics] and change the print to print '...', len(paras).

Question 6

@Peilonrayz Yes, sure. I also thought of somehow using generators to avoid all these del. But I wanted to center it around broad readability improvements and thoughts on possibly buggy behaviours.

Question 7

Great answer @MathiasEttinger! Thanks. The only issue is that I need to save count_list for each epidemic (the counts of infectious individuals for each time step). Right now, for some reason, count_list starts from [] for each set of reps. How can I fix that?

Question 8

@StatsSorceress Did you check you're not hitting MAX_FAILED_ATTEMPS each time?

Question 9

I already addressed readability and behaviour concerns in an other answer, but I wanted to also address the performance aspect and point at some good practices.

The first thing to do, when considering performances, is to stop making assumptions and run the program through a profiler. Python makes it easy, all you need is to run your program through the command line using:

python -m cProfile name_of_the_script.py

So using the last version of my other answer and filling the blanks, i.e. adding code that will simulate very simple behaviour of the missing functions:

full_mat = 'something'
def external_function_call(*args):
 return {key: random.randint(0, 20) for key in range(random.randint(0, 20))}
def inf_per_count_time(g):
 return min(g.values())

I get the following timings. Trimmed for the sake of readability:

number of parameter rows (9600,)
9600
 815595 function calls (815474 primitive calls) in 0.770 seconds
 Ordered by: standard name
 ncalls tottime percall cumtime percall filename:lineno(function)
 <snip> 
 1 0.006 0.006 0.771 0.771 virus.py:1(<module>)
 18388 0.100 0.000 0.504 0.000 virus.py:10(<dictcomp>)
 9600 0.006 0.000 0.015 0.000 virus.py:13(inf_per_count_time)
 1 0.044 0.044 0.714 0.714 virus.py:17(fun)
 18388 0.022 0.000 0.578 0.000 virus.py:9(external_function_call)
 <snip>

0.770 seconds... Far less than the anounced 2.5 days of computation. What is also interesting to note is that, the one call to fun takes 0.714 seconds and, in that one call, the 18388 calls to external_function_call takes 0.578 seconds.

This yield two conclusion: the extremely simplified version of external_function_call is called an average of 2 times per repetition; and it is taking nearly all the time of the computation, leaving a few more than 100ms for the loops and the writing to the file.

So, if you're looking for performance, you will have to:

Optimize external_function_call for speed;
Reduce the amount of calls made to this function; this may mean using better input parameters.

Now, knowing that fun has almost no influence on the total running time of the program, does not mean we should use inefficient constructs.

The first thing to consider is to use a single result list and to store all 6 results at once. Doing that is pretty easy using tuples, but Python provide a thin layer on top of them to make access to each element easier: collections.namedtuples

Using them, your code can become:

from collections import namedtuple
import itertools
import csv
MAX_FAILED_ATTEMPS = 50
EpidemyStatistics = namedtuple(
 'EpidemyStatistics',
 'data count susceptibility transmissibility infectious epsilon')
def fun(susc, trans, inf_period, eps, repetitions, pop):
 epidemies = []
 count = 0
 parameters_product = itertools.product(trans, inf_period, eps)
 for transmissibility, infectious_period, epsilon in parameters_product:
 while True:
 for rep in range(repetitions):
 for _ in range(MAX_FAILED_ATTEMPS):
 g1 = external_function_call(
 pop, susc, transmissibility,
 infectious_period, epsilon, full_mat)
 if len(g1) >= 10 and max(g1.values()) >= 10:
 g2 = inf_per_count_time(g1)
 count += 1
 epidemies.append(EpidemyStatistics(
 g1, g2, susc, transmissibility,
 infectious_period, epsilon))
 break
 else:
 transmissibility += 1
 # Cleanup because we failed too many times
 del epidemies[-rep:]
 if rep > 0:
 # if we've already written an epidemic
 # using this set of parameters
 count -=1
 break
 else:
 # do not restart if we made it through the whole repetitions
 break
 parameters = [
 (e.susceptibility, e.transmissibility, e.infectious, e.epsilon)
 for e in epidemies
 ]
 print 'number of parameter rows', len(parameters)
 with open('parameters.txt', 'w') as newfile1:
 writer = csv.writer(newfile1, delimiter=' ')
 writer.writerows(parameters)
 print count
if __name__ == "__main__":
 pop = 625
 susc = 0.3
 trans = [1.5, 2.5, 3]
 inf_period = [2, 3]
 eps = [0, 0.01, 0.02, 0.05]
 fun(susc, trans, inf_period, eps, 400, pop)

An other improvement would be to avoid the need for del, ...i.e._ avoid storing elements that you will remove afterwards. One way to do it, is to use an helper method that will temporarily store results for each repetition and return them, unless one repetition fails too many times:

from collections import namedtuple
import itertools
import csv
MAX_FAILED_ATTEMPS = 50
EpidemyStatistics = namedtuple(
 'EpidemyStatistics',
 'data count susceptibility transmissibility infectious epsilon')
def perform_repetitions(
 amount, population, susceptibility,
 transmissibility, infectious_period, epsilon):
 repetitions = []
 for _ in range(amount):
 for _ in range(MAX_FAILED_ATTEMPS):
 g1 = external_function_call(
 pop, susc, transmissibility,
 infectious_period, epsilon, full_mat)
 if len(g1) >= 10 and max(g1.values()) >= 10:
 g2 = inf_per_count_time(g1)
 repetitions.append(EpidemyStatistics(
 g1, g2, susc, transmissibility,
 infectious_period, epsilon))
 break
 else:
 return
 return repetitions
def fun(susc, trans, inf_period, eps, repetitions, pop):
 epidemies = []
 parameters_product = itertools.product(trans, inf_period, eps)
 for transmissibility, infectious_period, epsilon in parameters_product:
 while True:
 statistics = perform_repetitions(
 repetitions, pop, susc, transmissibility,
 infectious_period, epsilon)
 if statistics is not None:
 epidemies.extend(statistics)
 break
 # Failed attempt, try with worse conditions
 transmissibility += 1
 print 'number of parameter rows', len(epidemies)
 with open('parameters.txt', 'w') as newfile1:
 writer = csv.writer(newfile1, delimiter=' ')
 for parameters in epidemies:
 writer.writerow((
 parameters.susceptibility,
 parameters.transmissibility,
 parameters.infectious,
 parameters.epsilon
 ))
if __name__ == "__main__":
 pop = 625
 susc = 0.3
 trans = [1.5, 2.5, 3]
 inf_period = [2, 3]
 eps = [0, 0.01, 0.02, 0.05]
 fun(susc, trans, inf_period, eps, 400, pop)

With these changes, performances of the loops are around 20% better, with nearly the same amount of time spent in external_function_call:

number of parameter rows 9600
 777541 function calls in 0.692 seconds
 Ordered by: standard name
 ncalls tottime percall cumtime percall filename:lineno(function)
 <snip>
 1 0.006 0.006 0.692 0.692 virus.py:1(<module>)
 18506 0.027 0.000 0.566 0.000 virus.py:11(external_function_call)
 18506 0.095 0.000 0.488 0.000 virus.py:12(<dictcomp>)
 9600 0.007 0.000 0.016 0.000 virus.py:15(inf_per_count_time)
 24 0.041 0.002 0.651 0.027 virus.py:24(perform_repetitions)
 1 0.010 0.010 0.680 0.680 virus.py:45(fun)
 <snip>

The last thing I'd like to talk about, is how frozen this script feels. It's not friendly to modify it in case we want to modify the simulation parameters. Better allow the user to provide them on the command line, with sensible defaults if need be. argparse can greatly help there.

I have also the feeling that the susceptibility of the individuals to the disease would be a good candidate to parameters tweaking, so I would provide it as a list (even if this list contains only one value by default) and integrate it in the itertools.product:

from collections import namedtuple
import itertools
import csv
import argparse
MAX_FAILED_ATTEMPS = 50
EpidemyStatistics = namedtuple(
 'EpidemyStatistics',
 'data count susceptibility transmissibility infectious epsilon')
def perform_repetitions(
 amount, population, susceptibility,
 transmissibility, infectious_period, epsilon):
 repetitions = []
 for _ in range(amount):
 for _ in range(MAX_FAILED_ATTEMPS):
 g1 = external_function_call(
 pop, susc, transmissibility,
 infectious_period, epsilon, full_mat)
 if len(g1) >= 10 and max(g1.values()) >= 10:
 g2 = inf_per_count_time(g1)
 repetitions.append(EpidemyStatistics(
 g1, g2, susc, transmissibility,
 infectious_period, epsilon))
 break
 else:
 return
 return repetitions
def simulation(
 susceptibilities, transmissibilities, infectious_periods,
 epsilons, repetitions, population):
 epidemies = []
 parameters = itertools.product(
 susceptibilities, transmissibilities,
 infectious_periods, epsilons)
 for susceptibility, transmissibility, infectious_period, epsilon in parameters:
 while True:
 statistics = perform_repetitions(
 repetitions, population, susceptibility,
 transmissibility, infectious_period, epsilon)
 if statistics is not None:
 epidemies.extend(statistics)
 break
 # Failed attempt, try with worse conditions
 transmissibility += 1
 print 'number of parameter rows', len(epidemies)
 with open('parameters.txt', 'w') as newfile1:
 writer = csv.writer(newfile1, delimiter=' ')
 for parameters in epidemies:
 writer.writerow((
 parameters.susceptibility,
 parameters.transmissibility,
 parameters.infectious,
 parameters.epsilon
 ))
if __name__ == "__main__":
 parser = argparse.ArgumentParser(description='Some infos here')
 parser.add_argument('population', type=int)
 parser.add_argument('-s', '--susceptibility', type=float, nargs='+', default=[0.3])
 parser.add_argument('-t', '--transmissibility', type=float, nargs='+', default=[1.5, 2.5, 3])
 parser.add_argument('-i', '--infectious-period', type=int, nargs='+', default=[2, 3])
 parser.add_argument('-e', '--epsilon', type=float, nargs='+', default=[0, 0.01, 0.02, 0.05])
 parser.add_argument('-r', '--repetitions', type=int, default=400)
 args = parser.parse_args()
 simulation(
 args.susceptibility,
 args.transmissibility,
 args.infectious_period,
 args.epsilon,
 args.repetitions,
 args.population)

Usage being

python script.py 625

or

python script.py 625 -r 200 -t 1.25 1.5 1.75 2 2.25 -i 4 5 6

score 6 · Accepted Answer · 2016-12-16 10:03:29Z

78 characters of indentation at its deepest: this code is unreadable. We can't easily match the core of the code with the definition of the parameters.

To improve that, you can:

use 4 space per indentation level instead of 8 as recommended per PEP 8;
use itertools.product to iterate over all the combinations of parameters in one single loop instead of 3;
remove unused variable declaration such as your opens;
use the break ... else construct that can be applied to any loop, this will save you the use of the broken flag;
use slice deletion rather than deleting items one by one in a for loop (plus it will be more efficient).

This lead to the more readable:

import numpy as np
from scipy import spatial
import json
import itertools
def fun(susc, trans, inf_period, eps, reps, pop): 
 epi_list = []
 count_list = []
 new_susc = []
 new_trans = []
 new_inf_period = []
 new_eps = []
 count = 0
 for i, j, k in itertools.product(range(len(trans)), inf_period, eps):
 should_restart = True
 while should_restart:
 should_restart = False
 for rep in reps:
 failcount = 0
 g1 = external_function_call(pop, susc, trans[i], j, k, full_mat)
 while(len(g1.keys()) < 10 or np.max(g1.values()) < 10): 
 failcount += 1
 if failcount > 50:
 trans[i] += 1
 break
 g1 = external_function_call(pop, susc, trans[i], j, k, full_mat) #run again with new i, rep times
 else:
 g2 = inf_per_count_time(g1)
 count += 1
 epi_list.append(g1) #if the first epi in the reps works, but the subsequent ones do not, still writes. Bad!
 count_list.append(g2)
 new_susc.append(susc)
 new_trans.append(trans[i])
 new_inf_period.append(j)
 new_eps.append(k)
 continue
 # Cleanup because we failed too many times
 should_restart = True # restart from rep
 deletion_range = slice(0, rep, 1)
 del epi_list[deletion_range]
 del count_list[deletion_range]
 del new_susc[deletion_range]
 del new_trans[deletion_range]
 del new_inf_period[deletion_range]
 del new_eps[deletion_range]
 if rep > 0: #if we've already written an epidemic using this set of parameters
 count -=1
 break
 paras = np.array([np.asarray(new_susc), np.asarray(new_trans), np.asarray(new_inf_period), np.asarray(new_eps)]).T
 print 'number of parameter rows', paras[:,0].shape
 with open('parameters.txt', 'w') as newfile1:
 np.savetxt(newfile1, paras, fmt = ['%f', '%f', '%f', '%f'])
 print count
if __name__ == "__main__":
 pop = 625
 susc = 0.3
 trans = [1.5, 2.5, 3]
 inf_period = [2, 3]
 eps = [0, 0.01, 0.02, 0.05]
 reps = np.arange(400) 
 fun(susc, trans, inf_period, eps, reps, pop)

Now we can start thinking a bit about the code.

First off, you don't need to write the call to external_function_call twice, especially with the same set of parameters. It is more idiomatic to use a while True: <call> if <condition>: break rather than <call> while <condition>: <call>. This also let you handle the successful case within that if rather than with your broken flag.

In this test, you can also take the len of g1 directly, it's equivalent to using len(g1.keys()). And since g1 seems to be a regular Python dictionnary, there is no need in involving numpy there, Python already has a max builtin.

The fail count could also be better handled with a for loop and a named constant:

import numpy as np
from scipy import spatial
import json
import itertools
MAX_FAILED_ATTEMPS = 50
def fun(susc, trans, inf_period, eps, reps, pop): 
 epi_list = []
 count_list = []
 new_susc = []
 new_trans = []
 new_inf_period = []
 new_eps = []
 count = 0
 for i, j, k in itertools.product(range(len(trans)), inf_period, eps):
 should_restart = True
 while should_restart:
 should_restart = False
 for rep in reps:
 for _ in range(MAX_FAILED_ATTEMPS):
 g1 = external_function_call(pop, susc, trans[i], j, k, full_mat)
 if len(g1) >= 10 and max(g1.values()) >= 10:
 g2 = inf_per_count_time(g1)
 count += 1
 epi_list.append(g1) #if the first epi in the reps works, but the subsequent ones do not, still writes. Bad!
 count_list.append(g2)
 new_susc.append(susc)
 new_trans.append(trans[i])
 new_inf_period.append(j)
 new_eps.append(k)
 break
 else:
 trans[i] += 1
 # Cleanup because we failed too many times
 should_restart = True # restart from rep
 deletion_range = slice(0, rep, 1)
 del epi_list[deletion_range]
 del count_list[deletion_range]
 del new_susc[deletion_range]
 del new_trans[deletion_range]
 del new_inf_period[deletion_range]
 del new_eps[deletion_range]
 if rep > 0: #if we've already written an epidemic using this set of parameters
 count -=1
 break
 paras = np.array([np.asarray(new_susc), np.asarray(new_trans), np.asarray(new_inf_period), np.asarray(new_eps)]).T
 print 'number of parameter rows', paras[:,0].shape
 with open('parameters.txt', 'w') as newfile1:
 np.savetxt(newfile1, paras, fmt = ['%f', '%f', '%f', '%f'])
 print count
if __name__ == "__main__":
 pop = 625
 susc = 0.3
 trans = [1.5, 2.5, 3]
 inf_period = [2, 3]
 eps = [0, 0.01, 0.02, 0.05]
 reps = np.arange(400) 
 fun(susc, trans, inf_period, eps, reps, pop)

Now looking at this rewrite and the comment that was associated to the second call to external_function_call, it seems unlikely that this loop is doing any good. No parameters are updated between the various calls. So If the call fail once, it will fail 50 times... slowing down the whole thing unnecessarily... Unless you meant to trans[i] += 1 before each new call; or if external_function_call relly on some form of randomness.

An other thing that bothers me in your code, is how fragile the code is when handling the number of repetitions (reps). You seem to relly on it always starting at 0. But as it is written, I can pass any range, like range(5000, 5801, 2) to get 400 repetitions, not necessarily something that will start at 0.

Most importantly, you could have had some combinations of parameters that ran for each rep, say the first 2, so you will already have 800 results in your arrays. But all of a sudden, the third set of parameter fail 50 times at rep = 40. So you are deleting elements 39 down to 0 in your array... Wait, what? Why? These are results of previous sets of parameters, they are deemed valid, why on earth should we delete them and keep the 40 last results that we know should be restarted from rep = 0?

In the same vein, I don't understand why you count -= 1 if rep is over 0, instead of count -= rep every time.

import numpy as np
from scipy import spatial
import json
import itertools
MAX_FAILED_ATTEMPS = 50
def fun(susc, trans, inf_period, eps, repetitions, pop): 
 epi_list = []
 count_list = []
 new_susc = []
 new_trans = []
 new_inf_period = []
 new_eps = []
 count = 0
 for i, j, k in itertools.product(range(len(trans)), inf_period, eps):
 while True:
 for rep in range(repetitions):
 for _ in range(MAX_FAILED_ATTEMPS):
 g1 = external_function_call(pop, susc, trans[i], j, k, full_mat)
 if len(g1) >= 10 and max(g1.values()) >= 10:
 g2 = inf_per_count_time(g1)
 count += 1
 epi_list.append(g1) #if the first epi in the reps works, but the subsequent ones do not, still writes. Bad!
 count_list.append(g2)
 new_susc.append(susc)
 new_trans.append(trans[i])
 new_inf_period.append(j)
 new_eps.append(k)
 break
 else:
 trans[i] += 1
 # Cleanup because we failed too many times
 del epi_list[-rep:]
 del count_list[-rep:]
 del new_susc[-rep:]
 del new_trans[-rep:]
 del new_inf_period[-rep:]
 del new_eps[-rep:]
 if rep > 0: #if we've already written an epidemic using this set of parameters
 count -=1
 break
 else:
 break # do not restart if we made it through the whole repetitions
 paras = np.array([np.asarray(new_susc), np.asarray(new_trans), np.asarray(new_inf_period), np.asarray(new_eps)]).T
 print 'number of parameter rows', paras[:,0].shape
 with open('parameters.txt', 'w') as newfile1:
 np.savetxt(newfile1, paras, fmt = ['%f', '%f', '%f', '%f'])
 print count
if __name__ == "__main__":
 pop = 625
 susc = 0.3
 trans = [1.5, 2.5, 3]
 inf_period = [2, 3]
 eps = [0, 0.01, 0.02, 0.05]
 fun(susc, trans, inf_period, eps, 400, pop)

And one last note, I am not entirely sure that modifying trans[i] in place is a good idea, as it affect not only this set of parameters but also every further combinations using this particular value. Instead, I would only increment a local copy.

Oh, and make something for these meaningless, one-letter, variable names:

import numpy as np
from scipy import spatial
import json
import itertools
MAX_FAILED_ATTEMPS = 50
def fun(susc, trans, inf_period, eps, repetitions, pop): 
 epi_list = []
 count_list = []
 new_susc = []
 new_trans = []
 new_inf_period = []
 new_eps = []
 count = 0
 parameters_product = itertools.product(trans, inf_period, eps)
 for transmissibility, infectious_period, epsilon in parameters_product:
 while True:
 for rep in range(repetitions):
 for _ in range(MAX_FAILED_ATTEMPS):
 g1 = external_function_call(
 pop, susc, transmissibility,
 infectious_period, epsilon, full_mat)
 if len(g1) >= 10 and max(g1.values()) >= 10:
 g2 = inf_per_count_time(g1)
 count += 1
 epi_list.append(g1)
 count_list.append(g2)
 new_susc.append(susc)
 new_trans.append(transmissibility)
 new_inf_period.append(infectious_period)
 new_eps.append(epsilon)
 break
 else:
 transmissibility += 1
 # Cleanup because we failed too many times
 del epi_list[-rep:]
 del count_list[-rep:]
 del new_susc[-rep:]
 del new_trans[-rep:]
 del new_inf_period[-rep:]
 del new_eps[-rep:]
 if rep > 0:
 # if we've already written an epidemic
 # using this set of parameters
 count -=1
 break
 else:
 # do not restart if we made it through the whole repetitions
 break
 paras = np.array([
 np.asarray(new_susc),
 np.asarray(new_trans),
 np.asarray(new_inf_period),
 np.asarray(new_eps)
 ]).T
 print 'number of parameter rows', paras[:,0].shape
 with open('parameters.txt', 'w') as newfile1:
 np.savetxt(newfile1, paras, fmt = ['%f', '%f', '%f', '%f'])
 print count
if __name__ == "__main__":
 pop = 625
 susc = 0.3
 trans = [1.5, 2.5, 3]
 inf_period = [2, 3]
 eps = [0, 0.01, 0.02, 0.05]
 fun(susc, trans, inf_period, eps, 400, pop)

Very nice answer, +1 :) If you'd not thought of it you can merge all the lists into one epidemic list. I personally would use a named tuple as the value. But it allows you to change paras to [[e.susc, e.trans, e.inf_period, e.eps] for e in epidemics] and change the print to print '...', len(paras).
@Peilonrayz Yes, sure. I also thought of somehow using generators to avoid all these del. But I wanted to center it around broad readability improvements and thoughts on possibly buggy behaviours.
Great answer @MathiasEttinger! Thanks. The only issue is that I need to save count_list for each epidemic (the counts of infectious individuals for each time step). Right now, for some reason, count_list starts from [] for each set of reps. How can I fix that?
@StatsSorceress Did you check you're not hitting MAX_FAILED_ATTEMPS each time?

Stack Exchange Network

Epidemic simulation

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Epidemic simulation

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions