Optimization of algorithm performance

Question 1

I wrote the following code. A user from Stack Overflow recommended me to write my code here in order to be reviewed for optimization.

How could I optimize it?
Could I have this code in a professional, better form with increased performance?

Explanation: This code calculate multiplication of three matrix in c() function. Then, using Omo as old and Omn as new into c() calculate L(). Finally, defining two if statement accept Ls which are bigger than 1 and R3 which is random number. This code is MCMC fitting data. Here I fit omn and Mn.

import numpy as np
import emcee
import matplotlib.pyplot as plt
from math import *
from scipy.integrate import quad
from scipy.integrate import odeint
O_m=chi=None
xx=np.array([0.01,0.012]) 
yy=np.array([32.95388698,33.87900347])
Cov=[[137,168],[28155,-2217]] # this is a covariance matrix for simplification I chose the 2*2 one
temp=1e5
z0=0
M=2
Omo = 0.3
Odo=0.7
H0=70
def ant(z, Om, Od):
 return 1/sqrt(((1+z)**2)*(1+Om*z)-z*(2+z)*Od)
def dl(n, Om, Od, M):
 Od=1-Om
 q=quad(ant,0,xx[n],args=(Om,Od))[0]
 h=5*log10((1+xx[n])*q)
 fn=(yy[n]-M-h) 
 return fn
def c(Om, Od, M):
 f_list = []
 for i in range(2): # the value '2' reflects matrix size
 f_list.append(dl(i,Om,Od,M))
 rdag=[f_list]
 rmat=[[f] for f in f_list]
 a=np.dot(rdag,Cov)
 b=np.dot(a,rmat)
 Matrix=np.linalg.det(b)*0.000001
 return Matrix
N=2000
with open('txtfile.txt', 'w') as f:
 for i in range (1,N):
 R1=np.random.uniform(0,1)
 R2=np.random.uniform(0,1)
 R3=np.random.uniform(0,1)
 R4=np.random.uniform(0,1)
 def delta(r1, r2):
 sig=0.04
 d=sig*(np.sqrt(-2*np.log(r1))*np.cos(np.radians(r2)))
 return d
 Omn=Omo+delta(R1, R2)
 Odn=1-Omn
 Mn=M+delta(R3,R4)
 R3=np.random.uniform(0,1)
 def L():
 l=np.exp(-0.5*(c(Omn,Odn,Mn)-c(Omo,Odo,M)))
 return l
 if L()>1:
 O_m=Omn
 chi=c(Omn,Odn,Mn)
 elif L()>R3:
 O_m=Omn
 chi=c(Omn, Odn, Mn)
 f.write("{0}\t{1}\n".format(chi, O_m))
 print("Minimum of chi squre is")
 if chi<temp:
 temp=chi
 chimin=temp
 print(chimin)
 print(input("Press any key to exit... "))

I appreciate your help and your attention

Question 2

I guess the file write operation can occur after all computations are done, so the user knows the result on the screen, before the file is written out. This would require significant RAM, to collect the output.

Question 3

Can people voting to close as off-topic explain themselves? I have trouble understanding what is pseudocode or hypothetical code. If the post seems unclear by lack of description, there is the "unclear what you're asking" or "too broad" close reason... no need for "off-topic".

Question 4

I added the explanation you started writing from the other question. Note that you can use the edit function under the question to update it.

Question 5

import *

Since you only need sqrt and log10 from math, I would change this:

from math import *

to:

from math import sqrt, log10

PEP8

Use descriptive method names. I have no idea what dl or c mean
Valiable (and argument) names should be descriptive. It is very hard to determine what Omo is.
use lower_case for arguments and variable names
Spacing between operators

unused arguments

def dl(n, Om, Od, M):
 Od = 1 - Om

Why pass in Od if you overwrite it immediately afterwards?

In fact, Od is only used in ant, so why not just calculate it there and omit it for the rest?

global variables

dl uses xx while it's not in the argument list. It also only uses n to index xx, why not pass in the relevant value from xx as argument immediately?

Using this global variable makes it harder to use this method for other values

inner functions

Functions delta and L gets defined all 2000 iterations of the for-loop. This is unnecessary

L uses global state, and is calculated multiple times , why not just do

l = np.exp(-0.5 * (c(Omn, Odn, Mn) - c(Omo, Odo, M)))

making it a variable instead of a function?

Seperate the functions

Even in a simple script, you can seperate the production of the results from the presentation of the input

use `if name == 'main':`

so your program can be run a script, but also imported as a module

R3

Why generate a new random number with the same name? Either it's the same number, so no need to generate it, or it's a different number, so give it a new name

Besides that, if L() is larger than a number between 0 and 1, it will be larger than 1, so if .. elif is unnecessary

Magic numbers

Where come the 0.000001 in c and the 0.04 in delta come from? If these a constants, better name them so

avoid repetition

The way your program is setup, Cov is converted to a numpy array a lot, and c(Omn, Odn, Mn) and c(Omo, Odo, M) are calculated over and over without any change in arguments, better to do those things once

f_list

instead of appending, I would use a list comprehension.

One step further, since you need it as a np.array anyway, why not use np.fromiter and a generator expression? Why are you taking the determinant from a two-dimensional array, instead of doing the calculations on a one-dimensional array?

All in all, I would end up with something like this:

import numpy as np
import emcee
import matplotlib.pyplot as plt
from math import sqrt, log10
from scipy.integrate import quad
from scipy.integrate import odeint
from io import StringIO
def ant(z, Om):
 Od = 1 - Om
 return 1 / sqrt(((1 + z) ** 2) * (1 + Om * z) - z * (2 + z) * Od)
def dl(x, y, Om, M):
 q = quad(ant, 0, x, args=(Om,))[0]
 h = 5 * log10((1 + x) * q)
 return y - M - h
def c(xx, yy, cov, Om, M):
 f_list = np.fromiter(
 (dl(x, y, Om, M) for x, y in zip(xx, yy)), 
 dtype=float,
 count=len(xx),
 )
 a = np.dot(f_list, cov)
 b = np.dot(a, f_list.T)
 return b * 0.000001
def delta(r1, r2):
 sig = 0.04
 return sig * (np.sqrt(-2 * np.log(r1)) * np.cos(np.radians(r2)))
 
def calculation(xx, yy, cov, M, Omo, N):
 c0 = c(xx, yy, cov, Omo, M)
 for i in range(1, N):
 R1 = np.random.uniform(0, 1)
 R2 = np.random.uniform(0, 1)
 R3 = np.random.uniform(0, 1)
 R4 = np.random.uniform(0, 1)
 R5 = np.random.uniform(0, 1)
 Omn = Omo + delta(R1, R2)
 Mn = M + delta(R3, R4)
 c_ = c(xx, yy, cov, Omn, Mn)
 l = np.exp(-0.5 * (c_ - c0))
 if l > R5:
 O_m = Omn
 chi = c_
 yield chi, O_m
 
 
def main(filehandle):
 temp = 1e5
 z0 = 0
 M = 2
 Omo = 0.3
 H0 = 70
 N = 2000
 xx = np.array([0.01,0.012]) 
 yy = np.array([32.95388698,33.87900347])
 cov = np.array([[137,168],[28155,-2217]])
 
 
 
 for chi, O_m in calculation(xx, yy, cov, M, Omo, N):
 filehandle.write("{0}\t{1}\n".format(chi, O_m))
 
 if chi < temp:
 temp = chi
 chimin = temp
 return chimin
 
if __name__ == '__main__':
 # with StringIO() as filehandle:
 with open('txtfile.txt', 'w') as filehandle:
 chimin = main(filehandle)
 # print(filehandle.getvalue())
 print("Minimum of chi squre is")
 print(chimin)
 print(input("Press any key to exit... "))

I don't know whether this will spectacularly speed up your algorithm, but it will help, especially the more simple c. For this simple input, it needed about 33% less time on my machine

Larger gains will be made by vectorizing more, but for that I would need to have a better mathematical understanding of what's going on

Maarten Fabré Maarten Fabré 9,3901 gold badge15 silver badges27 bronze badges · Answer 1 · 2018-03-19 17:50:00Z

import *

Since you only need sqrt and log10 from math, I would change this:

from math import *

to:

from math import sqrt, log10

PEP8

Use descriptive method names. I have no idea what dl or c mean
Valiable (and argument) names should be descriptive. It is very hard to determine what Omo is.
use lower_case for arguments and variable names
Spacing between operators

unused arguments

def dl(n, Om, Od, M):
 Od = 1 - Om

Why pass in Od if you overwrite it immediately afterwards?

In fact, Od is only used in ant, so why not just calculate it there and omit it for the rest?

global variables

dl uses xx while it's not in the argument list. It also only uses n to index xx, why not pass in the relevant value from xx as argument immediately?

Using this global variable makes it harder to use this method for other values

inner functions

Functions delta and L gets defined all 2000 iterations of the for-loop. This is unnecessary

L uses global state, and is calculated multiple times , why not just do

l = np.exp(-0.5 * (c(Omn, Odn, Mn) - c(Omo, Odo, M)))

making it a variable instead of a function?

Seperate the functions

Even in a simple script, you can seperate the production of the results from the presentation of the input

use `if name == 'main':`

so your program can be run a script, but also imported as a module

R3

Why generate a new random number with the same name? Either it's the same number, so no need to generate it, or it's a different number, so give it a new name

Besides that, if L() is larger than a number between 0 and 1, it will be larger than 1, so if .. elif is unnecessary

Magic numbers

Where come the 0.000001 in c and the 0.04 in delta come from? If these a constants, better name them so

avoid repetition

The way your program is setup, Cov is converted to a numpy array a lot, and c(Omn, Odn, Mn) and c(Omo, Odo, M) are calculated over and over without any change in arguments, better to do those things once

f_list

instead of appending, I would use a list comprehension.

One step further, since you need it as a np.array anyway, why not use np.fromiter and a generator expression? Why are you taking the determinant from a two-dimensional array, instead of doing the calculations on a one-dimensional array?

All in all, I would end up with something like this:

import numpy as np
import emcee
import matplotlib.pyplot as plt
from math import sqrt, log10
from scipy.integrate import quad
from scipy.integrate import odeint
from io import StringIO
def ant(z, Om):
 Od = 1 - Om
 return 1 / sqrt(((1 + z) ** 2) * (1 + Om * z) - z * (2 + z) * Od)
def dl(x, y, Om, M):
 q = quad(ant, 0, x, args=(Om,))[0]
 h = 5 * log10((1 + x) * q)
 return y - M - h
def c(xx, yy, cov, Om, M):
 f_list = np.fromiter(
 (dl(x, y, Om, M) for x, y in zip(xx, yy)), 
 dtype=float,
 count=len(xx),
 )
 a = np.dot(f_list, cov)
 b = np.dot(a, f_list.T)
 return b * 0.000001
def delta(r1, r2):
 sig = 0.04
 return sig * (np.sqrt(-2 * np.log(r1)) * np.cos(np.radians(r2)))
 
def calculation(xx, yy, cov, M, Omo, N):
 c0 = c(xx, yy, cov, Omo, M)
 for i in range(1, N):
 R1 = np.random.uniform(0, 1)
 R2 = np.random.uniform(0, 1)
 R3 = np.random.uniform(0, 1)
 R4 = np.random.uniform(0, 1)
 R5 = np.random.uniform(0, 1)
 Omn = Omo + delta(R1, R2)
 Mn = M + delta(R3, R4)
 c_ = c(xx, yy, cov, Omn, Mn)
 l = np.exp(-0.5 * (c_ - c0))
 if l > R5:
 O_m = Omn
 chi = c_
 yield chi, O_m
 
 
def main(filehandle):
 temp = 1e5
 z0 = 0
 M = 2
 Omo = 0.3
 H0 = 70
 N = 2000
 xx = np.array([0.01,0.012]) 
 yy = np.array([32.95388698,33.87900347])
 cov = np.array([[137,168],[28155,-2217]])
 
 
 
 for chi, O_m in calculation(xx, yy, cov, M, Omo, N):
 filehandle.write("{0}\t{1}\n".format(chi, O_m))
 
 if chi < temp:
 temp = chi
 chimin = temp
 return chimin
 
if __name__ == '__main__':
 # with StringIO() as filehandle:
 with open('txtfile.txt', 'w') as filehandle:
 chimin = main(filehandle)
 # print(filehandle.getvalue())
 print("Minimum of chi squre is")
 print(chimin)
 print(input("Press any key to exit... "))

I don't know whether this will spectacularly speed up your algorithm, but it will help, especially the more simple c. For this simple input, it needed about 33% less time on my machine

Larger gains will be made by vectorizing more, but for that I would need to have a better mathematical understanding of what's going on

Stack Exchange Network

Optimization of algorithm performance

1 Answer 1

import *

PEP8

unused arguments

global variables

inner functions

Seperate the functions

use `if name == 'main':`

R3

Magic numbers

avoid repetition

f_list

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

1 Answer 1

import *

unused arguments

global variables

inner functions

Seperate the functions

use if __name__ == '__main'__:

R3

Magic numbers

avoid repetition

f_list

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

use `if name == 'main':`