5
\$\begingroup\$

there's a function in one of the scripts I've been working on that I really want to improve.

Disclaimer: Being honest, I'm not really very good at programming but I've been learning day to day and it's slowly working in the direction of improvements. Also, the code will assume you ran

import sys
import warnings
warnings.simplefilter("ignore")
import time
import random
import numpy as np
from numpy import diff
import scipy.integrate as integrate
import scipy
import matplotlib.pyplot as plt
from IPython.display import clear_output
global kjup
kjup = 0.0009543
plt.rcParams.update({'font.size': 26, 'font.family' : 'Bitstream Vera Sans',
 'font.weight' : 'normal'})
plt.rcParams['lines.linewidth'] = 3

Description: this function is called Table_plus_analytical because it's intended to be used with the purpose of creating a simple discretization of possible candidates to be then evaluated in a differential equations solver called LSODA. For this reason, I use a straight procedure describing every sub-function that is involved in, and then I'm appending the whole script if you don't really want any verbose.

  1. First step: From inputs create an array of combinations of every variable, recieved as min, max (and mean values in the case of 3 of the variables). This is an array with N combinations that's calculated as N = k**j where k is the number of elements per variable (I have intentions to raise the number of k as most as possible, without altering the time to complete one run), and j is the number of variables I will consider. An example
def Table_plus_Analytical(M,m1,m2, #mean, lower ERROR, upper ERROR
 alpha, #lowest VAL, highest VAL
 ta1,ta2,
 te1,te2,
 running_time=5/60, #in minutes 
 mass_array_generation='arange', #random normal, random uniform, arange
 params_array_generation='arange', #random uniform, arange
 f1_res = -1.190,f2_res = 0.428,
 q = 1,p = 3
 ):

This will return

4 elements in each parameter will be generated
1.32% of combinations satisfy the conditions; from 65536 possibilities, now there are 871 possibilities.
Conditions:
(0.03 < calculated_Delta_eq) & (calculated_Delta_eq<0.13)& (0.042/5 < calculated_e1eq) & (calculated_e1eq < 0.042/2) & (calculated_dta >= 1)
'\nwith np.printoptions(precision=4):\n print(result)\n'

and when uncommenting the print option:

>>> result = Table_plus_Analytical(M = [0.823,0.041,0.041], m1 = [0.978,0.06,0.06],m2 = [0.369,0.1,0.08],
 alpha = [0.6,1.3],
 ta1 = [1e3,1e5],ta2 = [1e2,1e5],
 te1 = [1e1,1e3],te2 = [1e0,1e2],
 running_time=20/60,
 mass_array_generation='random normal', #random normal, random uniform, arange
 params_array_generation='random uniform', #random uniform, arange
 f1 = -1.190,f2 = 1.688,
 q = 1,p = 3
 )
>>> with np.printoptions(precision=4):
>>> print(result)
(0.003 < calculated_Delta_eq) & (calculated_Delta_eq < 0.15) & (calculated_dta >= 1) & (calculated_ta1te1<=2500) & (calculated_ta2te2<=2500)
[[0.7894 1.055 0.2461 ... 0.0092 0.0374 3.8377]
 [0.7894 1.055 0.2461 ... 0.0085 0.0346 3.8377]
 [0.7894 1.055 0.2461 ... 0.0095 0.0386 3.8377]
 ...
 [0.8726 1.067 0.4093 ... 0.0169 0.0264 5.0594]
 [0.8726 1.067 0.4093 ... 0.0171 0.0266 5.0594]
 [0.8726 1.067 0.4093 ... 0.0169 0.0264 5.0594]]

So far so good

  1. The above task is satisfied using two different functions: a) Calculate_N_elements_per_array(M,m1,m2,alpha,ta1,ta2,te1,te2,running_time); b) Make_arrage(N, args, mass_array_generation, params_array_generation)
  • Calculate_N_elements_per_array will try to estimate a number of N elements that will define the length of very variable. With this I mean: if I want to evaluate the behavior of the function "y(x)=m*x+n", I need to tell the code to use N samples for the variable x. Why is this useful? in this case I don't have 1 variable, but 8 parameters that I'm interested in varying. The results will then be generated using this number, and I can tweak it manually using running_time and adjusting aprox_slope if I add more Calc_ functions or change my machine.

  • Make_arrange will generate the arrays, using N, args and the options for its generation.

  1. Calculate_fs(q, alpha_array); d1, d2, d3, ..., dN)Calc_Delta_eq and other Calc_... functions; e) Calculate_Filter_Possibilites are used to calculate quantities with the combinations generated and dropping the ones I don't like. This is done at plug_in = plug_in[(0.034 < calculated_Delta_eq) & (calculated_Delta_eq<0.037)& (calculated_dta >= 1)] #drop rows you don't need.
  • Calculate_fs will solve an integration problem using scipy. Calc_Delta_eq and folks will do more simplier and quicker math. If the intention is to improve the code's efficency, Question: how can I modify the calculation of fs, to make it faster? One proposal is to calculate a big list of f(q,alpha) discretizating this tiny problem: I know q can be q=1 or q=2 in this context, it won't take much more values. On the other hand, alpha moves from .1 to 10, and I'd like to test with as much resolution I can. This can be calculated, maybe, as a dict or .npy file to be readed when needing an f(q, alpha). This because the code will be evaluated millions of times, and evaluating f(alpha) for the same (alpha) thousand of times is not doing any good.
  1. I guess the last part of the code is simple to catch. I leave you with the script. I would really like to read any suggestion you would like to add. Please consider I'm learning, but I really want to make this code better to improve the quality of the results I'm getting (more combinations in less time is real good for me).

  2. Extra question: an astronomer folk told me that Google is providing a free service to run code. Is this real and where can I test this particular problem?


The whole script

def Table_plus_Analytical(M,m1,m2, #mean, lower ERROR, upper ERROR
 alpha, #lowest VAL, highest VAL
 ta1,ta2,
 te1,te2,
 running_time=5/60, #in minutes 
 mass_array_generation='arange', #random normal, random uniform, arange
 params_array_generation='arange', #random uniform, arange
 f1_res = -1.190,f2_res = 0.428,
 q = 1,p = 3
 ):
 start_time = time.time()
 def Calculate_N_elements_per_array(M,m1,m2,alpha,ta1,ta2,te1,te2,running_time):
 #parameters in a list; counting every parameter that will vary
 iterated_parameters_count, parameters= 0, [M,m1,m2,alpha,ta1,ta2,te1,te2]
 for par in parameters:
 if isinstance(par, list)==True and len(par)>1:
 iterated_parameters_count +=1
 dim = float(iterated_parameters_count)
 #estimating a linear response
 aprox_slope = .01 #seconds/million of values
 #Then time in seconds = aprox_slope * values (in millions) 
 running_time = 60*running_time #from minutes to seconds
 num_of_values_per_arrange = (running_time/aprox_slope)**(1/dim)
 N = num_of_values_per_arrange
 #(N) = (t/m)**(1/dim) if Ntotal = (N)
 
 print(str(int(N))+' elements in each parameter will be generated')
 return int(N)
 def Make_arrage(N, args, mass_array_generation, params_array_generation):
 
 masses = args[:3]
 elseargs = args[3:]
 
 if mass_array_generation == 'random normal':
 ############### Convert masses to np.arrays if desirable (when mass is list)
 mtemp = []
 for mass in masses:
 if isinstance(mass,int) or (isinstance(mass,list) and len(mass)==1):
 mass = mass
 if (isinstance(mass,list) and len(mass)>1):
 mu = mass[0]
 lo = mu - mass[1]
 hi = mu + mass[2]
 std = abs(hi-lo)/2
 mass = np.random.normal(mu, std, size=N)
 mtemp.append(mass)
 masses = mtemp
 ###############
 if mass_array_generation == 'random uniform':
 ############### Convert masses to np.arrays if desirable (when mass is list)
 mtemp = []
 
 for mass in masses:
 if isinstance(mass,int) or (isinstance(mass,list) and len(mass)==1):
 mass = mass
 if (isinstance(mass,list) and len(mass)>1):
 mu = mass[0]
 lo = mu - mass[1]
 hi = mu + mass[2]
 std = abs(hi-lo)/2
 mass = np.random.uniform(lo, hi, size=N)
 mtemp.append(mass)
 masses = mtemp
 if mass_array_generation == 'arange':
 ############### Convert masses to np.arrays if desirable (when mass is list)
 mtemp = []
 for mass in masses:
 if isinstance(mass,int) or (isinstance(mass,list) and len(mass)==1):
 mass = mass
 if (isinstance(mass,list) and len(mass)>1):
 mu = mass[0]
 lo = mu - mass[1]
 hi = mu + mass[2]
 std = abs(hi-lo)/2
 mass = np.arange(lo, hi, (hi-lo)/N)
 mtemp.append(mass)
 masses = mtemp
 ###############
 if params_array_generation == 'random uniform': 
 elseargs = [np.random.uniform(arg[0],arg[1],size=N) for arg in elseargs]
 if params_array_generation == 'arange':
 elseargs = [np.arange(arg[0], arg[1], abs(arg[0]-arg[1])/N) for arg in elseargs]
 #Now masses are np.arrays and floats joint in a list.
 array_args = masses+elseargs
 return array_args
 #################
 
 q=1 #!#!
 
 def Calculate_fs(q, alpha_array):
 def db_1_2(psi, j, alph): #alpha is replaced with "alph" because it's already a variable for alpha limits
 return (np.cos(j*psi))/((1-2*alph*np.cos(psi) + alph**2)**(1./2))
 def db_3_2(psi, j, alph):
 return (np.cos(j*psi))/((1-2*alph*np.cos(psi) + alph**2)**(3./2))
 def b_1_2(j, alph):
 return (1./np.pi) * integrate.quad(db_1_2, 0., 2*np.pi, args=(j, alph))[0]
 def b_3_2(j, alph):
 return (1./np.pi) * integrate.quad(db_3_2, 0., 2*np.pi, args=(j, alph))[0]
 def Calc_f1(q, alph):
 f1_result = (-1./2) * (2*(q+1)*b_1_2(q+1.,alph) 
 + (alph/2) * ( b_3_2(q+2., alph) + b_3_2(q, alph)) - (alph**2) * b_3_2(q+1., alph))
 return f1_result
 def Calc_f2(q, alph):
 f2_result = (1./2) * ((2.*q+1.) * b_1_2(q,alph) 
 + (alph/2.) *(b_3_2(q+1., alph) + b_3_2(q-1., alph)) - (alph**2) * b_3_2(q, alph))
 if q!= 1.:
 return f2_result
 else:
 return f2_result - 2*alph
 f1_array=[]
 f2_array=[]
 for alpha_val in alpha_array:
 f1_array.append(Calc_f1(q, alpha_val))
 f2_array.append(Calc_f2(q, alpha_val))
 return np.array(f1_array), np.array(f2_array)
 
 def Calc_Delta_eq(Ms,m1,m2,alpha,ta1,ta2,te1,te2,q,f1,f2):
 
 def Calc_A_factor(Ms,m1,m2,alpha,ta1,ta2,te1,te2,q,f1,f2):
 fact_1 = (3/(q**2. * te1))
 fact_2 = ( (q/(q+1)) * (m1/alpha/m2) + 1 )
 fact_3 = (alpha*m2/Ms)**2.
 fact_4 = ( (q+1) * f1**2. + (q**2. / (q+1) ) * (m1/alpha/m2) * f2**2. * (te1 / te2) )
 A = fact_1 * fact_2 * fact_3 * fact_4
 return A
 def Calc_B_factor(ta1,ta2): 
 fact_1 = 3/(2* ta1)
 fact_2 = (1 - (ta1/ta2))
 B = fact_1*fact_2
 return B
 A = Calc_A_factor(Ms,m1,m2,alpha,ta1,ta2,te1,te2,q,f1,f2)
 B = Calc_B_factor(ta1,ta2)
 Delta_eq = ( -1 * A / B)**(1./2.)
 return Delta_eq
 
 def Calc_e1_eq(Ms,m1,m2,alpha,ta1,ta2,te1,te2,q,f1,f2):
 Delta = Calc_Delta_eq(Ms,m1,m2,alpha,ta1,ta2,te1,te2,q,f1,f2)
 e1= alpha*(m2/Ms) * (abs(f1)/(Delta *q))
 return e1
 
 def Calc_e2_eq(Ms,m1,m2,alpha,ta1,ta2,te1,te2,q,f1,f2):
 Delta = Calc_Delta_eq(Ms,m1,m2,alpha,ta1,ta2,te1,te2,q,f1,f2)
 e2 = (m1/Ms) * (abs(f2)/((q+1)*Delta))
 return e2 
 def Calc_dta(ta1,ta2):
 return ta1/ta2
 
 def Calc_taitei(ta,te):
 return ta/te
 
 
 #################
 
 def Calculate_Filter_Possibilites(data, f1_res=f1_res, f2_res=f2_res, q=q, p=p):
 plug_in = np.stack(np.meshgrid(*data), axis=-1).reshape(-1, len(data)) #Cartesian product
 
 """
 calculated_Delta_eq=Calc_Delta_eq(Ms,m1,m2,alpha,ta1,ta2,te1,te2,q,f1,f2)
 calculated_e1eq = Calc_e1_eq(m1,m2,alpha,ta1,ta2,te1,te2,q,f1,f2)
 calculated_e2eq = Calc_e2_eq(m1,m2,alpha,ta1,ta2,te1,te2,q,f1,f2)
 """
 len_plugin = len(plug_in)
 
 old_len= len(plug_in)
 
 calculated_f1, calculated_f2 = Calculate_fs(q,plug_in[:,3])
 
 q =q *np.ones(len_plugin)
 p =p *np.ones(len_plugin)
 
 plug_in = np.column_stack([plug_in,q[:,None], calculated_f1[:,None],calculated_f2[:,None],p[:,None]]) #append extra columns
 
 calculated_Delta_eq=Calc_Delta_eq(plug_in[:,0],plug_in[:,1]*kjup,plug_in[:,2]*kjup,plug_in[:,3],plug_in[:,4],plug_in[:,5],plug_in[:,6],plug_in[:,7],plug_in[:,8],abs(plug_in[:,9]),abs(plug_in[:,10]))
 calculated_e1eq = Calc_e1_eq(plug_in[:,0],plug_in[:,1]*kjup,plug_in[:,2]*kjup,plug_in[:,3],plug_in[:,4],plug_in[:,5],plug_in[:,6],plug_in[:,7],plug_in[:,8],abs(plug_in[:,9]),abs(plug_in[:,10]))
 calculated_e2eq = Calc_e2_eq(plug_in[:,0],plug_in[:,1]*kjup,plug_in[:,2]*kjup,plug_in[:,3],plug_in[:,4],plug_in[:,5],plug_in[:,6],plug_in[:,7],plug_in[:,8],abs(plug_in[:,9]),abs(plug_in[:,10]))
 calculated_dta = Calc_dta(plug_in[:,4],plug_in[:,5])
 #,calculated_Delta_eq[:,None], calculated_e1eq[:,None], calculated_e2eq[:,None], calculated_dta[:,None], calc_mult[:,None]
 
 calculated_ta1te1 = Calc_taitei(plug_in[:,4],plug_in[:,6])
 calculated_ta2te2 = Calc_taitei(plug_in[:,5],plug_in[:,7])
 
 plug_in = np.column_stack([plug_in, calculated_Delta_eq[:,None], calculated_e1eq[:,None], calculated_e2eq[:,None], calculated_dta[:,None], calculated_ta1te1[:,None], calculated_ta2te2[:,None]])
 
 
 #Free on Delta, constrained corrected on eccentricity of Planet 1 (that is constraining Delta and e2 implicitly)
 plug_in = plug_in[(0.034 < calculated_Delta_eq) & (calculated_Delta_eq<0.037)& (calculated_dta >= 1)] #drop rows you don't need
 #& (0.01 < calculated_e1eq) & (calculated_e1eq < 0.07) & (0.04 < calculated_e2eq) & (calculated_e2eq < 0.17) 
 new_len = len(plug_in)
 
 print(str((new_len/old_len)*100)[:4]+'% of combinations satisfy the conditions; from '+str(old_len)+' possibilities, now there are '+str(new_len)+' possibilities.')
 print('Conditions:\n(0.03 < calculated_Delta_eq) & (calculated_Delta_eq<0.13)& (0.042/5 < calculated_e1eq) & (calculated_e1eq < 0.042/2) & (calculated_dta >= 1)')
 
 return plug_in
 N = Calculate_N_elements_per_array(M,m1,m2,alpha,ta1,ta2,te1,te2,running_time)
 if N<1:
 N=1
 
 array_args = Make_arrage(N, [M,m1,m2,alpha,ta1,ta2,te1,te2], mass_array_generation, params_array_generation)
 Table_filtered_possibilities = Calculate_Filter_Possibilites(array_args)
 
 return Table_filtered_possibilities
Edward
67.2k4 gold badges120 silver badges284 bronze badges
asked Dec 15, 2021 at 2:38
\$\endgroup\$
2
  • 1
    \$\begingroup\$ Is this a shorter version of your other question? codereview.stackexchange.com/q/271019/232484 \$\endgroup\$ Commented Dec 15, 2021 at 4:08
  • \$\begingroup\$ @Teepeemm Thanks very much for the comment. Is this a shorter question? Not actually. Here I'm pursueing an improvement in the way I calculate and append coefficents and combinations of parameters. In my other question I assume that this is not a problem (because the time-management issue is assumed to be fixed) and is asking for general reviews and comments to the way I generate plots and write code. \$\endgroup\$ Commented Dec 15, 2021 at 12:25

1 Answer 1

5
\$\begingroup\$

There's a lot of code here; I won't have time today for a full review, but maybe some of what I have to say about just the one function Calculate_fs can be illustrative in general.

  • Use clearer names. I understand that in a science context this can be hard; "q" may be the actual name of a parameter in a well-known formula. Unfortunately, "q" probably means lots of different things in different contexts. Try to cram some extra context into the name. You can use a comment to explain what a variable is, but only as a last resort!
  • I usually like the idea of placing stuff in the narrowest scope applicable, but that's not idiomatic in python. Put everything in the module scope, or in the broadest scope possible; when a reader sees db_1_2 declared inside Calculate_fs they'll expect that it relies on stuff that's only available inside that function. Moving things to a broader scope will also make smaller pieces easier to test! (And it will make good names even more important.)
  • You're setting q in the scope of Table_plus_Analytical just before defining Calculate_fs. Why? If it's needed there, try to make it clear why. If not, move it to where it is needed (or possibly up to the beginning of Table_plus_Analytical).
  • Obviously you have a lot of math. You've tried to make it dense enough to read by keeping names/literals short, and omitting some white-space, but that's not a great strategy. Instead, let your math be fluffy, but remove some of the redundancy. You can also leverage some python syntax more to help keep your lines structured.
from functools import partial
def db(psi, j, alpha, *, exponent): # decibels? database?
 return np.cos(j * psi)/( # opinions vary about white-space/parens in formula.
 (1 - (2 * alpha * np.cos(psi)) + (alpha ** 2))
 ** exponent
 )
def b(j, alpha, *, exponent): # What is this?
 return (1.0 / np.pi) * integrate.quad(
 partial(db, exponent=exponent), 0.0, 2 * np.pi, args=(j, alpha)
 )[0]
def Calc_f(q, alpha, *, offset): # offset should have a better name; IDK what's going on.
 offset_q = q + offset
 pre_result = (
 (2.0 * (q + 1) * b(offset_q, alpha, exponent=0.5))
 + ((alpha / 2) * (b(offset_q + 1.0, alpha, exponent=1.5)
 + b(offset_1 - 1.0, alpha, exponent=1.5)))
 - ((alpha ** 2) * b(offset_q, alpha, exponent=1.5))
 )
 return pre_result * (0.5 - offset) # moved `-2*alph` outside this function
def Calculate_fs(q, alpha_array):
 f1_array=[]
 f2_array=[]
 shift_f2 = -2.0 if q == 1.0 else 0
 for alpha_val in alpha_array:
 f1_array.append(Calc_f(q, alpha_val, offset=1.0))
 f2_array.append(Calc_f(q, alpha_val, offset=0.0) + (alpha_val * shift_f2))
 return np.array(f1_array), np.array(f2_array)

As for efficiency:
You're using a for loop to generate what I assume is a reasonably large list. You mentioned there may be ways of flattening this out into a series of operations on large arrays; that's probably a good idea. See extensive discussion here. In the mean time, it looks like your input alpha_array is already a numpy array, so you can at least skip the step of building lists. Here's a fromiter version; I don't know if that's better or worse than a vectorize version for you.

def Calculate_fs(q, alpha_array):
 shift_f2 = -2.0 if q == 1.0 else 0
 count = len(alpha_array)
 f1 = np.fromiter((Calc_f(q, alpha_val, offset=1.0)
 for alpha_val in alpha_array),
 dtype=float # ? I think numpy-native type would be best.
 count=count)
 f2 = np.fromiter((Calc_f(q, alpha_val, offset=0.0) + (alpha_val * shift_f2)
 for alpha_val in alpha_array),
 dtype=float # ? I think numpy-native type would be best.
 count=count)
 return f1, f2
answered Dec 15, 2021 at 16:51
\$\endgroup\$
1
  • 3
    \$\begingroup\$ Great review! I would also recommend importing the mathematical statements directly. Reading cos is easier than np.cos. Using annotated typing hints would also help: Quantity = Annotated[float, "information about what quantity denotes] similarly docstrings and occasionally comments would also greatly increase readability. There is also the problem that the code does not follow PEP8 which makes it harder to read. Example Calc_f vs calc_f. \$\endgroup\$ Commented Dec 15, 2021 at 18:11

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.