I wrote the following code. A user from Stack Overflow recommended me to write my code here in order to be reviewed for optimization.
How could I optimize it?
Could I have this code in a professional, better form with increased performance?
Explanation: This code calculate multiplication of three matrix in c()
function. Then, using Omo
as old and Omn
as new into c()
calculate L()
. Finally, defining two if
statement accept L
s which are bigger than 1 and R3
which is random number. This code is MCMC fitting data. Here I fit omn
and Mn
.
import numpy as np
import emcee
import matplotlib.pyplot as plt
from math import *
from scipy.integrate import quad
from scipy.integrate import odeint
O_m=chi=None
xx=np.array([0.01,0.012])
yy=np.array([32.95388698,33.87900347])
Cov=[[137,168],[28155,-2217]] # this is a covariance matrix for simplification I chose the 2*2 one
temp=1e5
z0=0
M=2
Omo = 0.3
Odo=0.7
H0=70
def ant(z, Om, Od):
return 1/sqrt(((1+z)**2)*(1+Om*z)-z*(2+z)*Od)
def dl(n, Om, Od, M):
Od=1-Om
q=quad(ant,0,xx[n],args=(Om,Od))[0]
h=5*log10((1+xx[n])*q)
fn=(yy[n]-M-h)
return fn
def c(Om, Od, M):
f_list = []
for i in range(2): # the value '2' reflects matrix size
f_list.append(dl(i,Om,Od,M))
rdag=[f_list]
rmat=[[f] for f in f_list]
a=np.dot(rdag,Cov)
b=np.dot(a,rmat)
Matrix=np.linalg.det(b)*0.000001
return Matrix
N=2000
with open('txtfile.txt', 'w') as f:
for i in range (1,N):
R1=np.random.uniform(0,1)
R2=np.random.uniform(0,1)
R3=np.random.uniform(0,1)
R4=np.random.uniform(0,1)
def delta(r1, r2):
sig=0.04
d=sig*(np.sqrt(-2*np.log(r1))*np.cos(np.radians(r2)))
return d
Omn=Omo+delta(R1, R2)
Odn=1-Omn
Mn=M+delta(R3,R4)
R3=np.random.uniform(0,1)
def L():
l=np.exp(-0.5*(c(Omn,Odn,Mn)-c(Omo,Odo,M)))
return l
if L()>1:
O_m=Omn
chi=c(Omn,Odn,Mn)
elif L()>R3:
O_m=Omn
chi=c(Omn, Odn, Mn)
f.write("{0}\t{1}\n".format(chi, O_m))
print("Minimum of chi squre is")
if chi<temp:
temp=chi
chimin=temp
print(chimin)
print(input("Press any key to exit... "))
I appreciate your help and your attention
-
\$\begingroup\$ I guess the file write operation can occur after all computations are done, so the user knows the result on the screen, before the file is written out. This would require significant RAM, to collect the output. \$\endgroup\$Gürkan Çetin– Gürkan Çetin2018年03月19日 15:04:34 +00:00Commented Mar 19, 2018 at 15:04
-
3\$\begingroup\$ Can people voting to close as off-topic explain themselves? I have trouble understanding what is pseudocode or hypothetical code. If the post seems unclear by lack of description, there is the "unclear what you're asking" or "too broad" close reason... no need for "off-topic". \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2018年03月19日 15:58:01 +00:00Commented Mar 19, 2018 at 15:58
-
2\$\begingroup\$ I added the explanation you started writing from the other question. Note that you can use the edit function under the question to update it. \$\endgroup\$Graipher– Graipher2018年03月19日 16:04:04 +00:00Commented Mar 19, 2018 at 16:04
1 Answer 1
import *
Since you only need sqrt
and log10
from math
, I would change this:
from math import *
to:
from math import sqrt, log10
PEP8
- Use descriptive method names. I have no idea what
dl
orc
mean - Valiable (and argument) names should be descriptive. It is very hard to determine what
Omo
is. - use lower_case for arguments and variable names
- Spacing between operators
unused arguments
def dl(n, Om, Od, M):
Od = 1 - Om
Why pass in Od
if you overwrite it immediately afterwards?
In fact, Od
is only used in ant
, so why not just calculate it there and omit it for the rest?
global variables
dl
uses xx
while it's not in the argument list. It also only uses n
to index xx
, why not pass in the relevant value from xx
as argument immediately?
Using this global variable makes it harder to use this method for other values
inner functions
Functions delta
and L
gets defined all 2000 iterations of the for-loop. This is unnecessary
L
uses global state, and is calculated multiple times , why not just do
l = np.exp(-0.5 * (c(Omn, Odn, Mn) - c(Omo, Odo, M)))
making it a variable instead of a function?
Seperate the functions
Even in a simple script, you can seperate the production of the results from the presentation of the input
use if __name__ == '__main'__:
so your program can be run a script, but also imported as a module
R3
Why generate a new random number with the same name? Either it's the same number, so no need to generate it, or it's a different number, so give it a new name
Besides that, if L()
is larger than a number between 0 and 1, it will be larger than 1, so if .. elif
is unnecessary
Magic numbers
Where come the 0.000001 in c
and the 0.04 in delta
come from? If these a constants, better name them so
avoid repetition
The way your program is setup, Cov
is converted to a numpy array a lot, and c(Omn, Odn, Mn) and c(Omo, Odo, M)
are calculated over and over without any change in arguments, better to do those things once
f_list
instead of appending, I would use a list comprehension.
One step further, since you need it as a np.array
anyway, why not use np.fromiter
and a generator expression?
Why are you taking the determinant from a two-dimensional array, instead of doing the calculations on a one-dimensional array?
All in all, I would end up with something like this:
import numpy as np
import emcee
import matplotlib.pyplot as plt
from math import sqrt, log10
from scipy.integrate import quad
from scipy.integrate import odeint
from io import StringIO
def ant(z, Om):
Od = 1 - Om
return 1 / sqrt(((1 + z) ** 2) * (1 + Om * z) - z * (2 + z) * Od)
def dl(x, y, Om, M):
q = quad(ant, 0, x, args=(Om,))[0]
h = 5 * log10((1 + x) * q)
return y - M - h
def c(xx, yy, cov, Om, M):
f_list = np.fromiter(
(dl(x, y, Om, M) for x, y in zip(xx, yy)),
dtype=float,
count=len(xx),
)
a = np.dot(f_list, cov)
b = np.dot(a, f_list.T)
return b * 0.000001
def delta(r1, r2):
sig = 0.04
return sig * (np.sqrt(-2 * np.log(r1)) * np.cos(np.radians(r2)))
def calculation(xx, yy, cov, M, Omo, N):
c0 = c(xx, yy, cov, Omo, M)
for i in range(1, N):
R1 = np.random.uniform(0, 1)
R2 = np.random.uniform(0, 1)
R3 = np.random.uniform(0, 1)
R4 = np.random.uniform(0, 1)
R5 = np.random.uniform(0, 1)
Omn = Omo + delta(R1, R2)
Mn = M + delta(R3, R4)
c_ = c(xx, yy, cov, Omn, Mn)
l = np.exp(-0.5 * (c_ - c0))
if l > R5:
O_m = Omn
chi = c_
yield chi, O_m
def main(filehandle):
temp = 1e5
z0 = 0
M = 2
Omo = 0.3
H0 = 70
N = 2000
xx = np.array([0.01,0.012])
yy = np.array([32.95388698,33.87900347])
cov = np.array([[137,168],[28155,-2217]])
for chi, O_m in calculation(xx, yy, cov, M, Omo, N):
filehandle.write("{0}\t{1}\n".format(chi, O_m))
if chi < temp:
temp = chi
chimin = temp
return chimin
if __name__ == '__main__':
# with StringIO() as filehandle:
with open('txtfile.txt', 'w') as filehandle:
chimin = main(filehandle)
# print(filehandle.getvalue())
print("Minimum of chi squre is")
print(chimin)
print(input("Press any key to exit... "))
I don't know whether this will spectacularly speed up your algorithm, but it will help, especially the more simple c
. For this simple input, it needed about 33% less time on my machine
Larger gains will be made by vectorizing more, but for that I would need to have a better mathematical understanding of what's going on