there's a function in one of the scripts I've been working on that I really want to improve.
Disclaimer: Being honest, I'm not really very good at programming but I've been learning day to day and it's slowly working in the direction of improvements. Also, the code will assume you ran
import sys
import warnings
warnings.simplefilter("ignore")
import time
import random
import numpy as np
from numpy import diff
import scipy.integrate as integrate
import scipy
import matplotlib.pyplot as plt
from IPython.display import clear_output
global kjup
kjup = 0.0009543
plt.rcParams.update({'font.size': 26, 'font.family' : 'Bitstream Vera Sans',
'font.weight' : 'normal'})
plt.rcParams['lines.linewidth'] = 3
Description: this function is called Table_plus_analytical
because it's intended to be used with the purpose of creating a simple discretization of possible candidates to be then evaluated in a differential equations solver called LSODA
. For this reason, I use a straight procedure describing every sub-function that is involved in, and then I'm appending the whole script if you don't really want any verbose.
- First step: From inputs create an array of combinations of every variable, recieved
as min, max (and mean values in the case of 3 of the variables).
This is an array with
N
combinations that's calculated asN = k**j
wherek
is the number of elements per variable (I have intentions to raise the number ofk
as most as possible, without altering the time to complete one run), andj
is the number of variables I will consider. An example
def Table_plus_Analytical(M,m1,m2, #mean, lower ERROR, upper ERROR
alpha, #lowest VAL, highest VAL
ta1,ta2,
te1,te2,
running_time=5/60, #in minutes
mass_array_generation='arange', #random normal, random uniform, arange
params_array_generation='arange', #random uniform, arange
f1_res = -1.190,f2_res = 0.428,
q = 1,p = 3
):
This will return
4 elements in each parameter will be generated
1.32% of combinations satisfy the conditions; from 65536 possibilities, now there are 871 possibilities.
Conditions:
(0.03 < calculated_Delta_eq) & (calculated_Delta_eq<0.13)& (0.042/5 < calculated_e1eq) & (calculated_e1eq < 0.042/2) & (calculated_dta >= 1)
'\nwith np.printoptions(precision=4):\n print(result)\n'
and when uncommenting the print option:
>>> result = Table_plus_Analytical(M = [0.823,0.041,0.041], m1 = [0.978,0.06,0.06],m2 = [0.369,0.1,0.08],
alpha = [0.6,1.3],
ta1 = [1e3,1e5],ta2 = [1e2,1e5],
te1 = [1e1,1e3],te2 = [1e0,1e2],
running_time=20/60,
mass_array_generation='random normal', #random normal, random uniform, arange
params_array_generation='random uniform', #random uniform, arange
f1 = -1.190,f2 = 1.688,
q = 1,p = 3
)
>>> with np.printoptions(precision=4):
>>> print(result)
(0.003 < calculated_Delta_eq) & (calculated_Delta_eq < 0.15) & (calculated_dta >= 1) & (calculated_ta1te1<=2500) & (calculated_ta2te2<=2500)
[[0.7894 1.055 0.2461 ... 0.0092 0.0374 3.8377]
[0.7894 1.055 0.2461 ... 0.0085 0.0346 3.8377]
[0.7894 1.055 0.2461 ... 0.0095 0.0386 3.8377]
...
[0.8726 1.067 0.4093 ... 0.0169 0.0264 5.0594]
[0.8726 1.067 0.4093 ... 0.0171 0.0266 5.0594]
[0.8726 1.067 0.4093 ... 0.0169 0.0264 5.0594]]
So far so good
- The above task is satisfied using two different functions: a)
Calculate_N_elements_per_array(M,m1,m2,alpha,ta1,ta2,te1,te2,running_time)
; b)Make_arrage(N, args, mass_array_generation, params_array_generation)
Calculate_N_elements_per_array
will try to estimate a number ofN
elements that will define the length of very variable. With this I mean: if I want to evaluate the behavior of the function "y(x)=m*x+n", I need to tell the code to useN
samples for the variable x. Why is this useful? in this case I don't have 1 variable, but 8 parameters that I'm interested in varying. The results will then be generated using this number, and I can tweak it manually usingrunning_time
and adjustingaprox_slope
if I add moreCalc_
functions or change my machine.Make_arrange
will generate the arrays, usingN
,args
and the options for its generation.
Calculate_fs(q, alpha_array)
; d1, d2, d3, ..., dN)Calc_Delta_eq
and otherCalc_...
functions; e)Calculate_Filter_Possibilites
are used to calculate quantities with the combinations generated and dropping the ones I don't like. This is done atplug_in = plug_in[(0.034 < calculated_Delta_eq) & (calculated_Delta_eq<0.037)& (calculated_dta >= 1)] #drop rows you don't need
.
Calculate_fs
will solve an integration problem usingscipy
.Calc_Delta_eq
and folks will do more simplier and quicker math. If the intention is to improve the code's efficency, Question: how can I modify the calculation offs
, to make it faster? One proposal is to calculate a big list of f(q,alpha) discretizating this tiny problem: I knowq
can beq=1
orq=2
in this context, it won't take much more values. On the other hand,alpha
moves from .1 to 10, and I'd like to test with as much resolution I can. This can be calculated, maybe, as a dict or .npy file to be readed when needing an f(q, alpha). This because the code will be evaluated millions of times, and evaluating f(alpha) for the same (alpha) thousand of times is not doing any good.
I guess the last part of the code is simple to catch. I leave you with the script. I would really like to read any suggestion you would like to add. Please consider I'm learning, but I really want to make this code better to improve the quality of the results I'm getting (more combinations in less time is real good for me).
Extra question: an astronomer folk told me that Google is providing a free service to run code. Is this real and where can I test this particular problem?
The whole script
def Table_plus_Analytical(M,m1,m2, #mean, lower ERROR, upper ERROR
alpha, #lowest VAL, highest VAL
ta1,ta2,
te1,te2,
running_time=5/60, #in minutes
mass_array_generation='arange', #random normal, random uniform, arange
params_array_generation='arange', #random uniform, arange
f1_res = -1.190,f2_res = 0.428,
q = 1,p = 3
):
start_time = time.time()
def Calculate_N_elements_per_array(M,m1,m2,alpha,ta1,ta2,te1,te2,running_time):
#parameters in a list; counting every parameter that will vary
iterated_parameters_count, parameters= 0, [M,m1,m2,alpha,ta1,ta2,te1,te2]
for par in parameters:
if isinstance(par, list)==True and len(par)>1:
iterated_parameters_count +=1
dim = float(iterated_parameters_count)
#estimating a linear response
aprox_slope = .01 #seconds/million of values
#Then time in seconds = aprox_slope * values (in millions)
running_time = 60*running_time #from minutes to seconds
num_of_values_per_arrange = (running_time/aprox_slope)**(1/dim)
N = num_of_values_per_arrange
#(N) = (t/m)**(1/dim) if Ntotal = (N)
print(str(int(N))+' elements in each parameter will be generated')
return int(N)
def Make_arrage(N, args, mass_array_generation, params_array_generation):
masses = args[:3]
elseargs = args[3:]
if mass_array_generation == 'random normal':
############### Convert masses to np.arrays if desirable (when mass is list)
mtemp = []
for mass in masses:
if isinstance(mass,int) or (isinstance(mass,list) and len(mass)==1):
mass = mass
if (isinstance(mass,list) and len(mass)>1):
mu = mass[0]
lo = mu - mass[1]
hi = mu + mass[2]
std = abs(hi-lo)/2
mass = np.random.normal(mu, std, size=N)
mtemp.append(mass)
masses = mtemp
###############
if mass_array_generation == 'random uniform':
############### Convert masses to np.arrays if desirable (when mass is list)
mtemp = []
for mass in masses:
if isinstance(mass,int) or (isinstance(mass,list) and len(mass)==1):
mass = mass
if (isinstance(mass,list) and len(mass)>1):
mu = mass[0]
lo = mu - mass[1]
hi = mu + mass[2]
std = abs(hi-lo)/2
mass = np.random.uniform(lo, hi, size=N)
mtemp.append(mass)
masses = mtemp
if mass_array_generation == 'arange':
############### Convert masses to np.arrays if desirable (when mass is list)
mtemp = []
for mass in masses:
if isinstance(mass,int) or (isinstance(mass,list) and len(mass)==1):
mass = mass
if (isinstance(mass,list) and len(mass)>1):
mu = mass[0]
lo = mu - mass[1]
hi = mu + mass[2]
std = abs(hi-lo)/2
mass = np.arange(lo, hi, (hi-lo)/N)
mtemp.append(mass)
masses = mtemp
###############
if params_array_generation == 'random uniform':
elseargs = [np.random.uniform(arg[0],arg[1],size=N) for arg in elseargs]
if params_array_generation == 'arange':
elseargs = [np.arange(arg[0], arg[1], abs(arg[0]-arg[1])/N) for arg in elseargs]
#Now masses are np.arrays and floats joint in a list.
array_args = masses+elseargs
return array_args
#################
q=1 #!#!
def Calculate_fs(q, alpha_array):
def db_1_2(psi, j, alph): #alpha is replaced with "alph" because it's already a variable for alpha limits
return (np.cos(j*psi))/((1-2*alph*np.cos(psi) + alph**2)**(1./2))
def db_3_2(psi, j, alph):
return (np.cos(j*psi))/((1-2*alph*np.cos(psi) + alph**2)**(3./2))
def b_1_2(j, alph):
return (1./np.pi) * integrate.quad(db_1_2, 0., 2*np.pi, args=(j, alph))[0]
def b_3_2(j, alph):
return (1./np.pi) * integrate.quad(db_3_2, 0., 2*np.pi, args=(j, alph))[0]
def Calc_f1(q, alph):
f1_result = (-1./2) * (2*(q+1)*b_1_2(q+1.,alph)
+ (alph/2) * ( b_3_2(q+2., alph) + b_3_2(q, alph)) - (alph**2) * b_3_2(q+1., alph))
return f1_result
def Calc_f2(q, alph):
f2_result = (1./2) * ((2.*q+1.) * b_1_2(q,alph)
+ (alph/2.) *(b_3_2(q+1., alph) + b_3_2(q-1., alph)) - (alph**2) * b_3_2(q, alph))
if q!= 1.:
return f2_result
else:
return f2_result - 2*alph
f1_array=[]
f2_array=[]
for alpha_val in alpha_array:
f1_array.append(Calc_f1(q, alpha_val))
f2_array.append(Calc_f2(q, alpha_val))
return np.array(f1_array), np.array(f2_array)
def Calc_Delta_eq(Ms,m1,m2,alpha,ta1,ta2,te1,te2,q,f1,f2):
def Calc_A_factor(Ms,m1,m2,alpha,ta1,ta2,te1,te2,q,f1,f2):
fact_1 = (3/(q**2. * te1))
fact_2 = ( (q/(q+1)) * (m1/alpha/m2) + 1 )
fact_3 = (alpha*m2/Ms)**2.
fact_4 = ( (q+1) * f1**2. + (q**2. / (q+1) ) * (m1/alpha/m2) * f2**2. * (te1 / te2) )
A = fact_1 * fact_2 * fact_3 * fact_4
return A
def Calc_B_factor(ta1,ta2):
fact_1 = 3/(2* ta1)
fact_2 = (1 - (ta1/ta2))
B = fact_1*fact_2
return B
A = Calc_A_factor(Ms,m1,m2,alpha,ta1,ta2,te1,te2,q,f1,f2)
B = Calc_B_factor(ta1,ta2)
Delta_eq = ( -1 * A / B)**(1./2.)
return Delta_eq
def Calc_e1_eq(Ms,m1,m2,alpha,ta1,ta2,te1,te2,q,f1,f2):
Delta = Calc_Delta_eq(Ms,m1,m2,alpha,ta1,ta2,te1,te2,q,f1,f2)
e1= alpha*(m2/Ms) * (abs(f1)/(Delta *q))
return e1
def Calc_e2_eq(Ms,m1,m2,alpha,ta1,ta2,te1,te2,q,f1,f2):
Delta = Calc_Delta_eq(Ms,m1,m2,alpha,ta1,ta2,te1,te2,q,f1,f2)
e2 = (m1/Ms) * (abs(f2)/((q+1)*Delta))
return e2
def Calc_dta(ta1,ta2):
return ta1/ta2
def Calc_taitei(ta,te):
return ta/te
#################
def Calculate_Filter_Possibilites(data, f1_res=f1_res, f2_res=f2_res, q=q, p=p):
plug_in = np.stack(np.meshgrid(*data), axis=-1).reshape(-1, len(data)) #Cartesian product
"""
calculated_Delta_eq=Calc_Delta_eq(Ms,m1,m2,alpha,ta1,ta2,te1,te2,q,f1,f2)
calculated_e1eq = Calc_e1_eq(m1,m2,alpha,ta1,ta2,te1,te2,q,f1,f2)
calculated_e2eq = Calc_e2_eq(m1,m2,alpha,ta1,ta2,te1,te2,q,f1,f2)
"""
len_plugin = len(plug_in)
old_len= len(plug_in)
calculated_f1, calculated_f2 = Calculate_fs(q,plug_in[:,3])
q =q *np.ones(len_plugin)
p =p *np.ones(len_plugin)
plug_in = np.column_stack([plug_in,q[:,None], calculated_f1[:,None],calculated_f2[:,None],p[:,None]]) #append extra columns
calculated_Delta_eq=Calc_Delta_eq(plug_in[:,0],plug_in[:,1]*kjup,plug_in[:,2]*kjup,plug_in[:,3],plug_in[:,4],plug_in[:,5],plug_in[:,6],plug_in[:,7],plug_in[:,8],abs(plug_in[:,9]),abs(plug_in[:,10]))
calculated_e1eq = Calc_e1_eq(plug_in[:,0],plug_in[:,1]*kjup,plug_in[:,2]*kjup,plug_in[:,3],plug_in[:,4],plug_in[:,5],plug_in[:,6],plug_in[:,7],plug_in[:,8],abs(plug_in[:,9]),abs(plug_in[:,10]))
calculated_e2eq = Calc_e2_eq(plug_in[:,0],plug_in[:,1]*kjup,plug_in[:,2]*kjup,plug_in[:,3],plug_in[:,4],plug_in[:,5],plug_in[:,6],plug_in[:,7],plug_in[:,8],abs(plug_in[:,9]),abs(plug_in[:,10]))
calculated_dta = Calc_dta(plug_in[:,4],plug_in[:,5])
#,calculated_Delta_eq[:,None], calculated_e1eq[:,None], calculated_e2eq[:,None], calculated_dta[:,None], calc_mult[:,None]
calculated_ta1te1 = Calc_taitei(plug_in[:,4],plug_in[:,6])
calculated_ta2te2 = Calc_taitei(plug_in[:,5],plug_in[:,7])
plug_in = np.column_stack([plug_in, calculated_Delta_eq[:,None], calculated_e1eq[:,None], calculated_e2eq[:,None], calculated_dta[:,None], calculated_ta1te1[:,None], calculated_ta2te2[:,None]])
#Free on Delta, constrained corrected on eccentricity of Planet 1 (that is constraining Delta and e2 implicitly)
plug_in = plug_in[(0.034 < calculated_Delta_eq) & (calculated_Delta_eq<0.037)& (calculated_dta >= 1)] #drop rows you don't need
#& (0.01 < calculated_e1eq) & (calculated_e1eq < 0.07) & (0.04 < calculated_e2eq) & (calculated_e2eq < 0.17)
new_len = len(plug_in)
print(str((new_len/old_len)*100)[:4]+'% of combinations satisfy the conditions; from '+str(old_len)+' possibilities, now there are '+str(new_len)+' possibilities.')
print('Conditions:\n(0.03 < calculated_Delta_eq) & (calculated_Delta_eq<0.13)& (0.042/5 < calculated_e1eq) & (calculated_e1eq < 0.042/2) & (calculated_dta >= 1)')
return plug_in
N = Calculate_N_elements_per_array(M,m1,m2,alpha,ta1,ta2,te1,te2,running_time)
if N<1:
N=1
array_args = Make_arrage(N, [M,m1,m2,alpha,ta1,ta2,te1,te2], mass_array_generation, params_array_generation)
Table_filtered_possibilities = Calculate_Filter_Possibilites(array_args)
return Table_filtered_possibilities
-
1\$\begingroup\$ Is this a shorter version of your other question? codereview.stackexchange.com/q/271019/232484 \$\endgroup\$Teepeemm– Teepeemm2021年12月15日 04:08:18 +00:00Commented Dec 15, 2021 at 4:08
-
\$\begingroup\$ @Teepeemm Thanks very much for the comment. Is this a shorter question? Not actually. Here I'm pursueing an improvement in the way I calculate and append coefficents and combinations of parameters. In my other question I assume that this is not a problem (because the time-management issue is assumed to be fixed) and is asking for general reviews and comments to the way I generate plots and write code. \$\endgroup\$nuwe– nuwe2021年12月15日 12:25:05 +00:00Commented Dec 15, 2021 at 12:25
1 Answer 1
There's a lot of code here; I won't have time today for a full review, but maybe some of what I have to say about just the one function Calculate_fs
can be illustrative in general.
- Use clearer names. I understand that in a science context this can be hard; "q" may be the actual name of a parameter in a well-known formula. Unfortunately, "q" probably means lots of different things in different contexts. Try to cram some extra context into the name. You can use a comment to explain what a variable is, but only as a last resort!
- I usually like the idea of placing stuff in the narrowest scope applicable, but that's not idiomatic in python. Put everything in the module scope, or in the broadest scope possible; when a reader sees
db_1_2
declared insideCalculate_fs
they'll expect that it relies on stuff that's only available inside that function. Moving things to a broader scope will also make smaller pieces easier to test! (And it will make good names even more important.) - You're setting
q
in the scope ofTable_plus_Analytical
just before definingCalculate_fs
. Why? If it's needed there, try to make it clear why. If not, move it to where it is needed (or possibly up to the beginning ofTable_plus_Analytical
). - Obviously you have a lot of math. You've tried to make it dense enough to read by keeping names/literals short, and omitting some white-space, but that's not a great strategy. Instead, let your math be fluffy, but remove some of the redundancy. You can also leverage some python syntax more to help keep your lines structured.
from functools import partial
def db(psi, j, alpha, *, exponent): # decibels? database?
return np.cos(j * psi)/( # opinions vary about white-space/parens in formula.
(1 - (2 * alpha * np.cos(psi)) + (alpha ** 2))
** exponent
)
def b(j, alpha, *, exponent): # What is this?
return (1.0 / np.pi) * integrate.quad(
partial(db, exponent=exponent), 0.0, 2 * np.pi, args=(j, alpha)
)[0]
def Calc_f(q, alpha, *, offset): # offset should have a better name; IDK what's going on.
offset_q = q + offset
pre_result = (
(2.0 * (q + 1) * b(offset_q, alpha, exponent=0.5))
+ ((alpha / 2) * (b(offset_q + 1.0, alpha, exponent=1.5)
+ b(offset_1 - 1.0, alpha, exponent=1.5)))
- ((alpha ** 2) * b(offset_q, alpha, exponent=1.5))
)
return pre_result * (0.5 - offset) # moved `-2*alph` outside this function
def Calculate_fs(q, alpha_array):
f1_array=[]
f2_array=[]
shift_f2 = -2.0 if q == 1.0 else 0
for alpha_val in alpha_array:
f1_array.append(Calc_f(q, alpha_val, offset=1.0))
f2_array.append(Calc_f(q, alpha_val, offset=0.0) + (alpha_val * shift_f2))
return np.array(f1_array), np.array(f2_array)
As for efficiency:
You're using a for
loop to generate what I assume is a reasonably large list. You mentioned there may be ways of flattening this out into a series of operations on large arrays; that's probably a good idea. See extensive discussion here. In the mean time, it looks like your input alpha_array
is already a numpy array, so you can at least skip the step of building lists. Here's a fromiter version; I don't know if that's better or worse than a vectorize version for you.
def Calculate_fs(q, alpha_array):
shift_f2 = -2.0 if q == 1.0 else 0
count = len(alpha_array)
f1 = np.fromiter((Calc_f(q, alpha_val, offset=1.0)
for alpha_val in alpha_array),
dtype=float # ? I think numpy-native type would be best.
count=count)
f2 = np.fromiter((Calc_f(q, alpha_val, offset=0.0) + (alpha_val * shift_f2)
for alpha_val in alpha_array),
dtype=float # ? I think numpy-native type would be best.
count=count)
return f1, f2
-
3\$\begingroup\$ Great review! I would also recommend importing the mathematical statements directly. Reading
cos
is easier thannp.cos
. Using annotated typing hints would also help:Quantity = Annotated[float, "information about what quantity denotes]
similarly docstrings and occasionally comments would also greatly increase readability. There is also the problem that the code does not follow PEP8 which makes it harder to read. ExampleCalc_f
vscalc_f
. \$\endgroup\$N3buchadnezzar– N3buchadnezzar2021年12月15日 18:11:16 +00:00Commented Dec 15, 2021 at 18:11
Explore related questions
See similar questions with these tags.