How do I speed up this simulation program

Question 1

This is a program to solve a differential equation numerically using Euler method. As of now, it is very slow, and I need to run 10000 Monte Carlo simulations.

The differential equation is called stochastic Landau-Lifschitz-Gilbert equation with Spin Transfer Torque effect, basically a differential equation having vector cross products.

I cannot use scipy because there is random noise added in each simulation step (ASFAIK) scipy cannot do that. And I also need to normalize the resultant vector in each step.

import numpy as np
from matplotlib import pyplot as plt
class Mtj:
 def __init__(self, step_size, t_max):
 #Physical constants
 self.mu0 = 4 * np.pi * 1e-7
 self.gamma = 2.2127e5
 self.kb = 1.38e-23
 self.e = 1.6e-19
 self.me = 9.1e-31
 self.hbar = 1.054e-34
 self.alpha = 0.05
 self.gamma_eff = self.gamma / (1 + self.alpha**2)
 #MTJ Dimensions
 self.D = 50e-9
 self.tf = 1.1e-9
 self.tox = 1.4e-9
 self.AMTJ = np.pi * (self.D**2) / 4
 self.Vf = self.AMTJ * self.tf
 #Demag tensors
 self.Nx = (np.pi * self.tf) / (4 * self.D)
 self.Ny = self.Nx
 self.Nz = 1 - (2 * self.Nx)
 #Mag field constants
 self.HEX = -(5 / 1e4) / self.mu0
 self.Hex = np.array([0, self.HEX, 0])
 self.HPMA = None
 self.HVCMA = None
 self.HDEMAG = None
 self.temperature = None
 #Magnetic Parameters
 self.Ms0 = 0.786e6
 self.Ki0 = 5.274e-4
 self.Beta0 = 97.3e-15
 #Electrical
 self.VMTJ = -0.2
 self.Istt = None
 self.Jstt = None
 self.Hstt = None
 #Temperature dependent parameters
 self.Msx = 1.73
 self.Tc = 875
 self.Ms = None 
 self.gammai = 2.18
 self.Ki = None 
 self.gammab = 2.83
 self.Beta = None
 self.P0 = 0.58
 self.asp = 2e-5
 self.P = None 
 #MTJ resistance
 self.phi = 0.4
 self.RA = 5e3
 self.F = 33141
 self.Rp = (self.tox / (self.F * self.phi**0.5 * self.AMTJ)) * np.exp((1.025 * self.tox * (2 * self.e * self.me * self.phi)**0.5) / self.hbar)
 self.Vh = 0.5
 self.TMR = 1.0 
 #simulation initialization
 self.step_size = step_size
 self.t_max = t_max
 self.num_steps = round(self.t_max / self.step_size)
 self.m_initial = np.array([0, 0, 1])
 self.magnetization = np.zeros((self.num_steps, 3))
 self.time = np.zeros(self.num_steps)
 self.m = self.m_initial
 self.m_r = np.array([0, 0, 1])
 def Set_temp_params(self, temperature):
 self.temperature = temperature
 self.Ms = self.Ms0 * (1 - (self.temperature / self.Tc)**self.Msx)
 self.Ki = self.Ki0 * (self.Ms / self.Ms0)**self.gammai
 self.Beta = self.Beta0 * (self.Ms / self.Ms0)**self.gammab
 self.P = self.P0 * (1 - self.asp * self.temperature**1.5 )
 self.Hstt = (self.hbar * self.P) / (2 * self.e * self.mu0 * self.Ms * self.tf) * self.gamma
 def Set_mag_fields(self):
 self.HPMA = (2 * self.Ki) / (self.mu0 * self.Ms * self.tf)
 self.HVCMA = - (2 * self.Beta * abs(self.VMTJ)) / (self.mu0 * self.Ms * self.tox * self.tf)
 self.HDEMAG = -self.Ms * np.array([self.Nx, self.Ny, self.Nz])
 def calculate_Rmtj(self):
 self.Rmtj = (self.Rp * (1 + (self.VMTJ/self.Vh)**2 + self.TMR)) \
 / (1 + (self.VMTJ**2/self.Vh**2) + self.TMR * (0.5 * (1 + self.m[2])))
 def llg(self):
 self.calculate_Rmtj()
 self.Istt = self.VMTJ / self.Rmtj
 self.Jstt = self.Istt / self.AMTJ
 sigmath = np.random.randn(3)
 sigmath = sigmath / np.linalg.norm(sigmath)
 Hthermal = np.sqrt((2 * self.kb * self.temperature * self.alpha) / ( self.mu0 * self.Ms * self.gamma * self.Vf * self.step_size)) * sigmath
 Hpma = np.array([0, 0, self.HPMA * self.m[2]])
 Hvcma = np.array([0, 0, self.HVCMA * self.m[2]])
 Hdemag = self.HDEMAG * self.m
 Heff = Hpma + Hdemag + self.Hex + Hvcma + Hthermal
 dmdt = -self.gamma_eff * np.cross(self.m, Heff) - self.alpha * self.gamma_eff * np.cross(self.m, np.cross(self.m, Heff)) - \
 self.Hstt * self.Jstt * np.cross(self.m, np.cross(self.m, self.m_r))
 return dmdt
 def run(self):
 for i in range(0,self.num_steps):
 dmdt = self.llg()
 self.m = self.m + self.step_size * dmdt
 self.m /= np.linalg.norm(self.m)
 self.magnetization[i] = self.m
 self.time[i] = i * self.step_size
 
 def plot(self):
 fig, ax = plt.subplots()
 ax.plot(self.time, self.magnetization[:, 0], label='Mx')
 ax.plot(self.time, self.magnetization[:, 1], label='My')
 ax.plot(self.time, self.magnetization[:, 2], label='Mz')
 ax.set_xlabel('Time (s)')
 ax.set_ylabel('Magnetization')
 ax.legend()
 plt.show()
 
mtj = Mtj(1e-12, 10e-9)
mtj.Set_temp_params(450)
mtj.Set_mag_fields()
mtj.run()
mtj.plot()

I know Matlab would be faster, but python is convenient. Please give me suggestion if there is any way to speed up my code.

Question 2

Have you tried using the Pypy JIT to run your code? It might speed things up.

Question 3

I don't see anything stochastic here.

Question 4

@vnp, you're looking for that np.random.randn(3) call. 8^)

Question 5

Like J_H mentioned in the answer, @numba.jit is a good start, but if you really want performance, write it in C or C++. That's the depressing truth about python performance: there isn't any. The good news is your code is mainly numeric calculations so translating it should be fairly straightforward.

Question 6

numpy is for the mass processing of data, there is an overhead for type and range checking. Atomic calls make it slow, using numpy.sin for a single argument is slower than using math.sin. Thus it might be helpful to generate all the random numbers that are used at once instead of only 3 in every loop.

Question 7

This was very difficult for a quantitative analyst to examine because the separation between variables and constants was not distinct. It has already been observed in another answer that your naming convention is odd. Attaching constants to the class is a bad choice and makes it more difficult to understand what is updated on the instance and what is not.

Specifically indicating what are constants.

The first thing I did was rename all of your constants so that I try to observe any optimisations in functions:

import numpy as np
from matplotlib import pyplot as plt
# Physical constants
C_mu0 = 4 * np.pi * 1e-7
C_gamma = 2.2127e5
C_kb = 1.38e-23
C_e = 1.6e-19
C_me = 9.1e-31
C_hbar = 1.054e-34
C_alpha = 0.05
C_gamma_eff = C_gamma / (1 + C_alpha ** 2)
# MTJ dimensions
C_D = 50e-9
C_tf = 1.1e-9
C_tox = 1.4e-9
C_AMTJ = np.pi * (C_D ** 2) / 4
C_vf = C_AMTJ * C_tf
# Demag tensors
C_Nx = (np.pi * C_tf) / (4 * C_D)
C_Ny = C_Nx
C_Nz = 1 - (2 * C_Nx)
# Mag field constants
C_HEX = -(5 / 1e4) / C_mu0
C_Hex = np.array([0, C_HEX, 0])
# C_HPMA = None
# C_HVCMA = None
# C_HDEMAG = None
# Magnetic Parameters
C_Ms0 = 0.786e6
C_Ki0 = 5.274e-4
C_Beta0 = 97.3e-15
# Electrical
C_VMTJ = -0.2
# Temperature dependent parameters
C_Msx = 1.73
C_Tc = 875
C_gammai = 2.18
C_gammab = 2.83
C_P0 = 0.58
C_asp = 2e-5
# MTJ resistance
C_phi = 0.4
C_RA = 5e3
C_F = 33141
C_Rp = (C_tox / (C_F * C_phi ** 0.5 * C_AMTJ)) * np.exp(
 (1.025 * C_tox * (2 * C_e * C_me * C_phi) ** 0.5) / C_hbar)
C_Vh = 0.5
C_TMR = 1.0
class Mtj:
 def __init__(self, step_size, t_max):
 # Mag field constants
 self.HPMA = None
 self.HVCMA = None
 self.HDEMAG = None
 self.temperature = None
 # Electrical
 self.Istt = None
 self.Jstt = None
 self.Hstt = None
 # Temperature dependent parameters
 self.Ms = None
 self.Ki = None
 self.Beta = None
 self.P = None
 # simulation initialization
 self.step_size = step_size
 self.t_max = t_max
 self.num_steps = round(self.t_max / self.step_size)
 self.m_initial = np.array([0, 0, 1])
 self.magnetization = np.zeros((self.num_steps, 3))
 self.time = np.zeros(self.num_steps)
 self.m = self.m_initial
 self.m_r = np.array([0, 0, 1])
 def Set_temp_params(self, temperature):
 self.temperature = temperature
 self.Ms = C_Ms0 * (1 - (self.temperature / C_Tc) ** C_Msx)
 self.Ki = C_Ki0 * (self.Ms / C_Ms0) ** C_gammai
 self.Beta = C_Beta0 * (self.Ms / C_Ms0) ** C_gammab
 self.P = C_P0 * (1 - C_asp * self.temperature ** 1.5)
 self.Hstt = (C_hbar * self.P) / (2 * C_e * C_mu0 * self.Ms * C_tf) * C_gamma
 def Set_mag_fields(self):
 self.HPMA = (2 * self.Ki) / (C_mu0 * self.Ms * C_tf)
 self.HVCMA = - (2 * self.Beta * abs(C_VMTJ)) / (C_mu0 * self.Ms * C_tox * C_tf)
 self.HDEMAG = -self.Ms * np.array([C_Nx, C_Ny, C_Nz])
 def calculate_Rmtj(self):
 self.Rmtj = (C_Rp * (1 + (C_VMTJ / C_Vh) ** 2 + C_TMR)) \
 / (1 + (C_VMTJ ** 2 / C_Vh ** 2) + C_TMR * (0.5 * (1 + self.m[2])))
 def llg(self):
 self.calculate_Rmtj()
 self.Istt = C_VMTJ / self.Rmtj
 self.Jstt = self.Istt / C_AMTJ
 sigmath = np.random.randn(3)
 sigmath = sigmath / np.linalg.norm(sigmath)
 Hthermal = np.sqrt((2 * C_kb * self.temperature * C_alpha) / (
 C_mu0 * self.Ms * C_gamma * C_vf * self.step_size)) * sigmath
 Hpma = np.array([0, 0, self.HPMA * self.m[2]])
 Hvcma = np.array([0, 0, self.HVCMA * self.m[2]])
 Hdemag = self.HDEMAG * self.m
 Heff = Hpma + Hdemag + C_Hex + Hvcma + Hthermal
 dmdt = -C_gamma_eff * np.cross(self.m, Heff) - C_alpha * C_gamma_eff * np.cross(self.m,
 np.cross(self.m,
 Heff)) - \
 self.Hstt * self.Jstt * np.cross(self.m, np.cross(self.m, self.m_r))
 return dmdt
 def run(self):
 for i in range(0, self.num_steps):
 dmdt = self.llg()
 self.m = self.m + self.step_size * dmdt
 self.m /= np.linalg.norm(self.m)
 self.magnetization[i] = self.m
 self.time[i] = i * self.step_size
 def plot(self):
 fig, ax = plt.subplots()
 ax.plot(self.time, self.magnetization[:, 0], label='Mx')
 ax.plot(self.time, self.magnetization[:, 1], label='My')
 ax.plot(self.time, self.magnetization[:, 2], label='Mz')
 ax.set_xlabel('Time (s)')
 ax.set_ylabel('Magnetization')
 ax.legend()
 plt.show()

Mixing initialisation and execution

Secondly you have a separate process for initialisating step sizes, which you do on the instance and a separate method for setting temperature parameters. You should set the temperature parameters at initialisation of the class.

Then I relabelled temperature names as constants internal to the class so that it was further obvious what was updated during a run.

class Mtj:
 def __init__(self, step_size, t_max, temperature):
 # Mag field constants
 self.HPMA = None
 self.HVCMA = None
 self.HDEMAG = None
 # Temperature
 self.temperature = temperature
 self.C_Ms = C_Ms0 * (1 - (self.temperature / C_Tc) ** C_Msx)
 self.C_Ki = C_Ki0 * (self.C_Ms / C_Ms0) ** C_gammai
 self.C_Beta = C_Beta0 * (self.C_Ms / C_Ms0) ** C_gammab
 self.C_P = C_P0 * (1 - C_asp * self.temperature ** 1.5)
 self.C_Hstt = (C_hbar * self.C_P) / (2 * C_e * C_mu0 * self.C_Ms * C_tf) * C_gamma
 # Electrical
 self.Istt = None
 self.Jstt = None
 # simulation initialization
 self.step_size = step_size
 self.t_max = t_max
 self.num_steps = round(self.t_max / self.step_size)
 self.m_initial = np.array([0, 0, 1])
 self.magnetization = np.zeros((self.num_steps, 3))
 self.time = np.zeros(self.num_steps)
 self.m = self.m_initial
 self.m_r = np.array([0, 0, 1])
 def Set_mag_fields(self):
 self.HPMA = (2 * self.C_Ki) / (C_mu0 * self.C_Ms * C_tf)
 self.HVCMA = - (2 * self.C_Beta * abs(C_VMTJ)) / (C_mu0 * self.C_Ms * C_tox * C_tf)
 self.HDEMAG = -self.C_Ms * np.array([C_Nx, C_Ny, C_Nz])
 def calculate_Rmtj(self):
 self.Rmtj = (C_Rp * (1 + (C_VMTJ / C_Vh) ** 2 + C_TMR)) \
 / (1 + (C_VMTJ ** 2 / C_Vh ** 2) + C_TMR * (0.5 * (1 + self.m[2])))
 def llg(self):
 self.calculate_Rmtj()
 self.Istt = C_VMTJ / self.Rmtj
 self.Jstt = self.Istt / C_AMTJ
 sigmath = np.random.randn(3)
 sigmath = sigmath / np.linalg.norm(sigmath)
 Hthermal = np.sqrt((2 * C_kb * self.temperature * C_alpha) / (
 C_mu0 * self.C_Ms * C_gamma * C_vf * self.step_size)) * sigmath
 Hpma = np.array([0, 0, self.HPMA * self.m[2]])
 Hvcma = np.array([0, 0, self.HVCMA * self.m[2]])
 Hdemag = self.HDEMAG * self.m
 Heff = Hpma + Hdemag + C_Hex + Hvcma + Hthermal
 dmdt = -C_gamma_eff * np.cross(self.m, Heff) - C_alpha * C_gamma_eff * np.cross(self.m,
 np.cross(self.m,
 Heff)) - \
 self.C_Hstt * self.Jstt * np.cross(self.m, np.cross(self.m, self.m_r))
 return dmdt

After doing this I could observe that set_mag_fields was just setting constants as well. So this was restructured into the initialisation.

class Mtj:
 def __init__(self, step_size, t_max, temperature):
 # Temperature
 self.temperature = temperature
 self.C_Ms = C_Ms0 * (1 - (self.temperature / C_Tc) ** C_Msx)
 self.C_Ki = C_Ki0 * (self.C_Ms / C_Ms0) ** C_gammai
 self.C_Beta = C_Beta0 * (self.C_Ms / C_Ms0) ** C_gammab
 self.C_P = C_P0 * (1 - C_asp * self.temperature ** 1.5)
 self.C_Hstt = (C_hbar * self.C_P) / (2 * C_e * C_mu0 * self.C_Ms * C_tf) * C_gamma
 # Mag field constants
 self.C_HPMA = (2 * self.C_Ki) / (C_mu0 * self.C_Ms * C_tf)
 self.C_HVCMA = - (2 * self.C_Beta * abs(C_VMTJ)) / (C_mu0 * self.C_Ms * C_tox * C_tf)
 self.C_HDEMAG = -self.C_Ms * np.array([C_Nx, C_Ny, C_Nz])

Standardising routines

I removed calculate_Rmtj. A better name for it would have been set_Rmtj but it was a one line function out of context with the rest of your code style.

Unfortunately I have to get back to work now.. But with the increased visibility it might be easier to look for further optimisations and any possible vectorisations. I am sure this can be improved further for performance gains.

Full Updated code

import numpy as np
from matplotlib import pyplot as plt
# Physical constants
C_mu0 = 4 * np.pi * 1e-7
C_gamma = 2.2127e5
C_kb = 1.38e-23
C_e = 1.6e-19
C_me = 9.1e-31
C_hbar = 1.054e-34
C_alpha = 0.05
C_gamma_eff = C_gamma / (1 + C_alpha ** 2)
# MTJ dimensions
C_D = 50e-9
C_tf = 1.1e-9
C_tox = 1.4e-9
C_AMTJ = np.pi * (C_D ** 2) / 4
C_vf = C_AMTJ * C_tf
# Demag tensors
C_Nx = (np.pi * C_tf) / (4 * C_D)
C_Ny = C_Nx
C_Nz = 1 - (2 * C_Nx)
# Mag field constants
C_HEX = -(5 / 1e4) / C_mu0
C_Hex = np.array([0, C_HEX, 0])
# C_HPMA = None
# C_HVCMA = None
# C_HDEMAG = None
# Magnetic Parameters
C_Ms0 = 0.786e6
C_Ki0 = 5.274e-4
C_Beta0 = 97.3e-15
# Electrical
C_VMTJ = -0.2
# Temperature dependent parameters
C_Msx = 1.73
C_Tc = 875
C_gammai = 2.18
C_gammab = 2.83
C_P0 = 0.58
C_asp = 2e-5
# MTJ resistance
C_phi = 0.4
C_RA = 5e3
C_F = 33141
C_Rp = (C_tox / (C_F * C_phi ** 0.5 * C_AMTJ)) * np.exp(
 (1.025 * C_tox * (2 * C_e * C_me * C_phi) ** 0.5) / C_hbar)
C_Vh = 0.5
C_TMR = 1.0
class Mtj:
 def __init__(self, step_size, t_max, temperature):
 # Temperature
 self.temperature = temperature
 self.C_Ms = C_Ms0 * (1 - (self.temperature / C_Tc) ** C_Msx)
 self.C_Ki = C_Ki0 * (self.C_Ms / C_Ms0) ** C_gammai
 self.C_Beta = C_Beta0 * (self.C_Ms / C_Ms0) ** C_gammab
 self.C_P = C_P0 * (1 - C_asp * self.temperature ** 1.5)
 self.C_Hstt = (C_hbar * self.C_P) / (2 * C_e * C_mu0 * self.C_Ms * C_tf) * C_gamma
 # Mag field constants
 self.C_HPMA = (2 * self.C_Ki) / (C_mu0 * self.C_Ms * C_tf)
 self.C_HVCMA = - (2 * self.C_Beta * abs(C_VMTJ)) / (C_mu0 * self.C_Ms * C_tox * C_tf)
 self.C_HDEMAG = -self.C_Ms * np.array([C_Nx, C_Ny, C_Nz])
 # Electrical
 self.Istt = None
 self.Jstt = None
 # simulation initialization
 self.step_size = step_size
 self.t_max = t_max
 self.num_steps = round(self.t_max / self.step_size)
 self.m_initial = np.array([0, 0, 1])
 self.magnetization = np.zeros((self.num_steps, 3))
 self.time = np.zeros(self.num_steps)
 self.m = self.m_initial
 self.m_r = np.array([0, 0, 1])
 def llg(self):
 self.Rmtj = (C_Rp * (1 + (C_VMTJ / C_Vh) ** 2 + C_TMR)) \
 / (1 + (C_VMTJ ** 2 / C_Vh ** 2) + C_TMR * (0.5 * (1 + self.m[2])))
 self.Istt = C_VMTJ / self.Rmtj
 self.Jstt = self.Istt / C_AMTJ
 sigmath = np.random.randn(3)
 sigmath = sigmath / np.linalg.norm(sigmath)
 Hthermal = np.sqrt((2 * C_kb * self.temperature * C_alpha) / (
 C_mu0 * self.C_Ms * C_gamma * C_vf * self.step_size)) * sigmath
 Hpma = np.array([0, 0, self.C_HPMA * self.m[2]])
 Hvcma = np.array([0, 0, self.C_HVCMA * self.m[2]])
 Hdemag = self.C_HDEMAG * self.m
 Heff = Hpma + Hdemag + C_Hex + Hvcma + Hthermal
 dmdt = -C_gamma_eff * np.cross(self.m, Heff) - C_alpha * C_gamma_eff * np.cross(self.m,
 np.cross(self.m,
 Heff)) - \
 self.C_Hstt * self.Jstt * np.cross(self.m, np.cross(self.m, self.m_r))
 return dmdt
 def run(self):
 for i in range(0, self.num_steps):
 dmdt = self.llg()
 self.m = self.m + self.step_size * dmdt
 self.m /= np.linalg.norm(self.m)
 self.magnetization[i] = self.m
 self.time[i] = i * self.step_size
 def plot(self):
 fig, ax = plt.subplots()
 ax.plot(self.time, self.magnetization[:, 0], label='Mx')
 ax.plot(self.time, self.magnetization[:, 1], label='My')
 ax.plot(self.time, self.magnetization[:, 2], label='Mz')
 ax.set_xlabel('Time (s)')
 ax.set_ylabel('Magnetization')
 ax.legend()
 plt.show()
mtj = Mtj(1e-12, 10e-9, 450)
mtj.run()
mtj.plot()

Actually you can now observe that these lines:

sigmath = np.random.randn(3)
sigmath = sigmath / np.linalg.norm(sigmath)
Hthermal = np.sqrt(
 (2 * C_kb * self.temperature * C_alpha) /
 (C_mu0 * self.C_Ms * C_gamma * C_vf * self.step_size)
) * sigmath

contain purely constants and random variables. You can extract this outside the class and create one large vectorised array rather than calculating it repeatedly in each loop.

Profiling

np.cross accounts for 60% of the profiling speed.

enter image description here

It runs 50,000 times. You can observe that one of the equations does not need to be calculated twice: np.cross(self.m, Heff) can be precalculated and substitued in, which will reduce to 40,000 times.

You may also find that your triple vector cross products are more rapidly solved by the alternative formulation. But for this you would have to experiment.

enter image description here

Edited with suggestions

I took the constants and random variables to initialisation and also reduced the cross product computations to 40,000. Speed improved:

enter image description here

I also used the alternative formulation for the triple cross product as follows:

 m_dot_m = np.dot(self.m, self.m)
 m_dot_Heff = np.dot(self.m, Heff)
 m_dot_r = np.dot(self.m, self.m_r)
 m_x_m_x_Heff = self.m * m_dot_Heff - Heff * m_dot_m
 m_x_m_x_r = self.m * m_dot_r - self.m_r * m_dot_m
 cross_m_Heff = np.cross(self.m, Heff)
 dmdt = -C_gamma_eff * cross_m_Heff - C_alpha * C_gamma_eff * \
 m_x_m_x_Heff - \
 self.C_Hstt * self.Jstt * m_x_m_x_r

This was very effective, notice the time for llg

enter image description here

The real gains

You mentioned that you need to do this 10,000 times. You cannot vectorize or parallelize the above because the operations need to be performed in sequence.

However you can, and should, vectorize the run of 10,000 of these simulations. For example the array magnetization has dimension (num_steps, 3), but you aim to create a 3D tensor which is of dimension (num_steps, 3, 10000).

That is why I reformulated the equation using dot products, so that later you can vectorize all of these formulae to obtain the speed from numpy's C implementation.

Thus far


import numpy as np
from matplotlib import pyplot as plt
# Physical constants
C_mu0 = 4 * np.pi * 1e-7
C_gamma = 2.2127e5
C_kb = 1.38e-23
C_e = 1.6e-19
C_me = 9.1e-31
C_hbar = 1.054e-34
C_alpha = 0.05
C_gamma_eff = C_gamma / (1 + C_alpha ** 2)
# MTJ dimensions
C_D = 50e-9
C_tf = 1.1e-9
C_tox = 1.4e-9
C_AMTJ = np.pi * (C_D ** 2) / 4
C_vf = C_AMTJ * C_tf
# Demag tensors
C_Nx = (np.pi * C_tf) / (4 * C_D)
C_Ny = C_Nx
C_Nz = 1 - (2 * C_Nx)
# Mag field constants
C_HEX = -(5 / 1e4) / C_mu0
C_Hex = np.array([0, C_HEX, 0])
# C_HPMA = None
# C_HVCMA = None
# C_HDEMAG = None
# Magnetic Parameters
C_Ms0 = 0.786e6
C_Ki0 = 5.274e-4
C_Beta0 = 97.3e-15
# Electrical
C_VMTJ = -0.2
# Temperature dependent parameters
C_Msx = 1.73
C_Tc = 875
C_gammai = 2.18
C_gammab = 2.83
C_P0 = 0.58
C_asp = 2e-5
# MTJ resistance
C_phi = 0.4
C_RA = 5e3
C_F = 33141
C_Rp = (C_tox / (C_F * C_phi ** 0.5 * C_AMTJ)) * np.exp(
 (1.025 * C_tox * (2 * C_e * C_me * C_phi) ** 0.5) / C_hbar)
C_Vh = 0.5
C_TMR = 1.0
class Mtj:
 def __init__(self, step_size, t_max, temperature):
 # Temperature
 self.temperature = temperature
 self.C_Ms = C_Ms0 * (1 - (self.temperature / C_Tc) ** C_Msx)
 self.C_Ki = C_Ki0 * (self.C_Ms / C_Ms0) ** C_gammai
 self.C_Beta = C_Beta0 * (self.C_Ms / C_Ms0) ** C_gammab
 self.C_P = C_P0 * (1 - C_asp * self.temperature ** 1.5)
 self.C_Hstt = (C_hbar * self.C_P) / (2 * C_e * C_mu0 * self.C_Ms * C_tf) * C_gamma
 # Mag field constants
 self.C_HPMA = (2 * self.C_Ki) / (C_mu0 * self.C_Ms * C_tf)
 self.C_HVCMA = - (2 * self.C_Beta * abs(C_VMTJ)) / (C_mu0 * self.C_Ms * C_tox * C_tf)
 self.C_HDEMAG = -self.C_Ms * np.array([C_Nx, C_Ny, C_Nz])
 # Electrical
 self.Istt = None
 self.Jstt = None
 # simulation initialization
 self.step_size = step_size
 self.t_max = t_max
 self.num_steps = round(self.t_max / self.step_size)
 self.m_initial = np.array([0, 0, 1])
 self.magnetization = np.zeros((self.num_steps, 3))
 self.time = np.zeros(self.num_steps)
 self.m = self.m_initial
 self.m_r = np.array([0, 0, 1])
 sigmath = np.random.randn(3, self.num_steps)
 sigmath = sigmath / np.linalg.norm(sigmath, axis=0)
 self.C_Hthermal = np.sqrt((2 * C_kb * self.temperature * C_alpha) / (
 C_mu0 * self.C_Ms * C_gamma * C_vf * self.step_size)) * sigmath
 def llg(self, i):
 self.Rmtj = (C_Rp * (1 + (C_VMTJ / C_Vh) ** 2 + C_TMR)) \
 / (1 + (C_VMTJ ** 2 / C_Vh ** 2) + C_TMR * (0.5 * (1 + self.m[2])))
 self.Istt = C_VMTJ / self.Rmtj
 self.Jstt = self.Istt / C_AMTJ
 Hpma = np.array([0, 0, self.C_HPMA * self.m[2]])
 Hvcma = np.array([0, 0, self.C_HVCMA * self.m[2]])
 Hdemag = self.C_HDEMAG * self.m
 Heff = Hpma + Hdemag + C_Hex + Hvcma + self.C_Hthermal[:, i]
 m_dot_m = np.dot(self.m, self.m)
 m_dot_Heff = np.dot(self.m, Heff)
 m_dot_r = np.dot(self.m, self.m_r)
 m_x_m_x_Heff = self.m * m_dot_Heff - Heff * m_dot_m
 m_x_m_x_r = self.m * m_dot_r - self.m_r * m_dot_m
 cross_m_Heff = np.cross(self.m, Heff)
 dmdt = -C_gamma_eff * cross_m_Heff - C_alpha * C_gamma_eff * \
 m_x_m_x_Heff - \
 self.C_Hstt * self.Jstt * m_x_m_x_r
 return dmdt
 def run(self):
 for i in range(0, self.num_steps):
 dmdt = self.llg(i)
 self.m = self.m + self.step_size * dmdt
 self.m /= np.linalg.norm(self.m)
 self.magnetization[i] = self.m
 self.time[i] = i * self.step_size
 def plot(self):
 fig, ax = plt.subplots()
 ax.plot(self.time, self.magnetization[:, 0], label='Mx')
 ax.plot(self.time, self.magnetization[:, 1], label='My')
 ax.plot(self.time, self.magnetization[:, 2], label='Mz')
 ax.set_xlabel('Time (s)')
 ax.set_ylabel('Magnetization')
 ax.legend()
 plt.show()
mtj = Mtj(1e-12, 10e-9, 450)
mtj.run()
mtj.plot()

Question 8

Takes less than one second to run on my end. A significant improvement in terms of speed.

Question 9

Thanks! Actually using explcit cross product made it even faster, I precalculated the random number array, now it runs is about 0.5 seconds. Could you tell me how you got that screenshot of benchmark statistics?

Question 10

I used the profiling tool in the professional version of Pycharm, which uses the cProfile profiler.

Question 11

Choosing a "this is a constant" prefix is a design choice, that’s cool. But please prefer lowercase "c_" over the proposed "C_", out of respect for PEP-8. (I gave the OP a pass on most naming details, due to very helpful adherence to math conventions which are a bit different from python conventions.) Yeah, yeah, I know, manifest constants are typically upcased. But here, mixed case sometimes makes an identifier more readable, more familiar.

Question 12

Pep8 module level constants are usually all caps. The only reason I didnt use all caps is it was easier to stick with the OP original naming for comparison and when editing to avoid errors. I prefer the capital C because it is more obvious and it better suited the task presented here and made the answer clearer to me and, in my opinion, to most.

Question 13

I'm not an expert but let me raise very common tips that make a calculation faster and that you might be already aware of.

Decreasing the number of function calls

You call llg() 10000 times in the loop, which then calls calculate_Rmtj() in it. Calling a function is slow. Did you try expanding llg() and calculate_Rmtj() directly in the loop? (yes that will make your code messy). Also, you call randn(), cross() and norm() in the loop. Some of them can be slow. It looks like randn() can be called before the loop, i.e., building (10000, 3) array by randn(10000, 3) and then you can access elements in the loop.

Using a JIT compiler

If you want to make your function work faster, a JIT compiler might be the thing. numba is a very popular package for this purpose. By using it, you can for example make llg() run a lot faster (by orders of magnitude). But you need to modify your code.

Using local variables

Using local variables are generally faster way. I'm not very sure about this but accessing self.something from a function scope over and over looks not economical. You can try to pass them to a function as arguments.

Question 14

physical constants

Thank you for using MKS, it's all very clear.

 self.e = 1.6e-19
 self.me = 9.1e-31

Number of coulombs is correct to three decimal places. But it's just too easy to specify it to five, in keeping with the γ sig figs: 1.6022 Similarly 9.1094 for mass, which had just two sig figs.

I can see some small attraction to always having self.this and self.that, but still, choosing to allocate new storage for these on each object instance is an odd modeling choice. Consider putting such constants at top-level of this or another module. Consider relying on a pypi package which already offers them, such as astropy or scipy:

from scipy.constants import e, m_e, mu_0

When we examine a particular "constant", this (modern) OP code has the curious artifact of relying on the value of mu_0 that was used from 1948 through 2018. The redefinition of 2019 altered its value to the slightly different one we see in the scipy package. This was due to ditching the platinum kilogram prototype, which regrettably was starting to exhibit discrepancies of ~ 50 μg.

tl;dr: These details are boring and not part of the core contribution of this work; outsource them so you needn't worry about them.

naming

These are distinct identifiers:

 self.HEX = -(5 / 1e4) / self.mu0
 self.Hex = np.array([0, self.HEX, 0])

But I do wish you'd pick a slightly different name for that first one. Sometimes colleagues discuss code over the telephone. Try to choose names that will sound distinct when recited aloud, or when recited mentally during quiet code reading.

 def Set_temp_params(self, temperature):
 ...
 def Set_mag_fields(self):

Pep-8 asked you nicely to please use a lowercase set_ prefix for such methods.

cite a reference

 def llg(self):

The Review Context made it clear enough that this is the Landau-Lifschitz-Gilbert equation, and I thank you for that. But the source code really needs a """docstring""" or at least a # comment to that effect.

Citing some author would also shed a great deal of light on F, Rmtj, and the various other expressions. As written this code is on the opaque side. I do appreciate that it is trying hard to conform to math notation conventions.

profiling

This submission would have benefited from the inclusion of profiling measurements.

speed

We pick a random vector in 3-space, good.

 def llg(self):
 ...
 sigmath = np.random.randn(3)

But then we run() it \10ドル^4\$ times. There's an opportunity to roll all the random numbers at once here, and let numpy / BLAS worry about looping.

Or if you wish to conserve memory, pick some compromise, perhaps by rolling 300 numbers at a time and doing a hundred runs through this.

If you don't go that route, consider adding type annotation which mypy lints clean on, and then giving this method numba's @njit decorator.

You're computing nine temp vars, each of them changing slightly on every iteration. It's not clear to me how big the partial derivative is on each of them. If some of them are fairly stable, changing by some tiny epsilon with every iteration, consider hoisting them out of this function, that is, hoisting them out of the loop. So if we do ten loop unrolls, we compute just once at top of unroll a smallish change which is 10x bigger than what OP code computes, rather than computing order-of-magnitude smaller values ten times. Then we still need to loop back \10ドル^3\$ times, an improvement over \10ドル^4\$ times.

Question 15

Thank you I will try to incorporate them. Another thing I noticed is that the code became about 5-6x faster when I used an excplicit cross product instead of the numpy builtin one. Would you happen to know why?

Question 16

Is there a way I can put my new code somehwere and we can continue this discussion? Should I post the new code in this question itself?

Question 17

When you take the tour, it describes a couple of possible outcomes. For example, you might choose to post a self Answer, or create a related new Question.

Question 18

 self.HPMA = (2 * self.Ki) / (self.mu0 * self.Ms * self.tf)
 self.HVCMA = - (2 * self.Beta * abs(self.VMTJ)) / (self.mu0 * self.Ms * self.tox * self.tf)

With interpreted languages, I'm never sure how much common subexpression elimination to expect.

 def Set_temp_params(self, temperature):
 self.temperature = temperature
 temperature_coefficient = 1 - (self.temperature / self.Tc) ** self.Msx
 self.Ms = self.Ms0 * temperature_coefficient 
 self.Ki = self.Ki0 * temperature_coefficient ** self.gamma_i
 self.Beta = self.Beta0 * temperature_coefficient ** self.gamma_b
 ...
 def Set_mag_fields(self):
 fudge = 2 / (self.mu0 * self.Ms * self.tf)
 self.HPMA = self.Ki * fudge
 self.HVCMA = -(self.Beta * abs(self.VMTJ)) / self.tox * fudge
 ...

should be algebraically equivalent, and the expression for HVCMA isn't overly long any more.
(I noticed another instance of self.mu0 * self.Ms in llg().)

Question 19

"I'm never sure how much common subexpression elimination" last I checked python doesn't have it, period. Which makes sense when you consider all the types are statically unknown so operators can be arbitrary functions.

Question 20

I am not very qualified to comment on your algorithm, so this is naive review from the POV of developer, but here is some generic advice that can be used in any situation where you want to optimize slow code, or identify bottlenecks.

I often use the codetiming package for this purpose.

So I have applied decorators on each function like:

@Timer(name="llg", text="Ran in {:0.8f} seconds", logger=logger.debug)
def llg(self):
 ....

And then this code to produce a breakdown by function:

logger.info("Timing summary")
logger.info("--------------")
total_time = sum(Timer.timers.values())
logger.info(f"Total time: {total_time:.8f}")
logger.info("name time cost ")
logger.info("------------------------------ ---------- ------")
for k, v in Timer.timers.items():
 logger.info(f"{k:30} {v:.8f} {(v / total_time):.2%}")

Result:

2024年05月15日 18:30:44,361 - INFO - L163 - Timing summary
2024年05月15日 18:30:44,361 - INFO - L164 - --------------
2024年05月15日 18:30:44,361 - INFO - L166 - Total time: 7.78605968
2024年05月15日 18:30:44,361 - INFO - L167 - name time cost 
2024年05月15日 18:30:44,361 - INFO - L168 - ------------------------------ ---------- ------
2024年05月15日 18:30:44,361 - INFO - L170 - __init__ 0.00012388 0.00%
2024年05月15日 18:30:44,361 - INFO - L170 - Set_temp_params 0.00000414 0.00%
2024年05月15日 18:30:44,361 - INFO - L170 - Set_mag_fields 0.00001162 0.00%
2024年05月15日 18:30:44,361 - INFO - L170 - calculate_Rmtj 0.02407876 0.31%
2024年05月15日 18:30:44,361 - INFO - L170 - llg 3.64657636 46.83%
2024年05月15日 18:30:44,361 - INFO - L170 - run 4.11526491 52.85%

This is a quick and convenient way to pinpoint slow code. And then you can dig further by wrapping selected blocks of code like this:

with Timer(name="some code", text="Ran in {:0.8f} seconds", logger=logger.info):
 # do something
 ...

Check for example the block in function llg that has np.cross...

The second strategy is that you can cache the results of expensive operations. The docs have a basic example about it:

@cache
def factorial(n):
 return n * factorial(n-1) if n else 1
>>> factorial(10) # no previously cached result, makes 11 recursive calls
3628800
>>> factorial(5) # just looks up cached value result
120
>>> factorial(12) # makes two new recursive calls, the other 10 are cached
479001600

As long as the arguments are "hashable" you should be able to apply that logic to any purpose. Some variables do not change over the course of program execution, so if you find yourself repeatedly calculating the same values over and over, this "trick" will save a lot of precious (CPU) time.

You will quite likely have to refactor your code to benefit from this approach. You would need to pass arguments to your functions rather than rely on self.whatever. I find that approach bug-prone, because it's not immediately clear where and how the values are instantiated. Or you could use setters and getters.

But this is not my code and I have my own way of doing things so...

In functions like Set_mag_fields where you set three class attributes:

def Set_mag_fields(self):
 self.HPMA = (2 * self.Ki) / (self.mu0 * self.Ms * self.tf)
 self.HVCMA = - (2 * self.Beta * abs(self.VMTJ)) / (self.mu0 * self.Ms * self.tox * self.tf)
 self.HDEMAG = -self.Ms * np.array([self.Nx, self.Ny, self.Nz])

You could return a namedtuple instead, if that makes sense.

Misc: as per PEP8 constants and magic numbers should be uppercase.

The class could perfectly be written as dataclass.

Attack68 Attack68 6333 silver badges8 bronze badges · Answer 1 · 2024-05-16 14:28:57Z

This was very difficult for a quantitative analyst to examine because the separation between variables and constants was not distinct. It has already been observed in another answer that your naming convention is odd. Attaching constants to the class is a bad choice and makes it more difficult to understand what is updated on the instance and what is not.

Specifically indicating what are constants.

The first thing I did was rename all of your constants so that I try to observe any optimisations in functions:

import numpy as np
from matplotlib import pyplot as plt
# Physical constants
C_mu0 = 4 * np.pi * 1e-7
C_gamma = 2.2127e5
C_kb = 1.38e-23
C_e = 1.6e-19
C_me = 9.1e-31
C_hbar = 1.054e-34
C_alpha = 0.05
C_gamma_eff = C_gamma / (1 + C_alpha ** 2)
# MTJ dimensions
C_D = 50e-9
C_tf = 1.1e-9
C_tox = 1.4e-9
C_AMTJ = np.pi * (C_D ** 2) / 4
C_vf = C_AMTJ * C_tf
# Demag tensors
C_Nx = (np.pi * C_tf) / (4 * C_D)
C_Ny = C_Nx
C_Nz = 1 - (2 * C_Nx)
# Mag field constants
C_HEX = -(5 / 1e4) / C_mu0
C_Hex = np.array([0, C_HEX, 0])
# C_HPMA = None
# C_HVCMA = None
# C_HDEMAG = None
# Magnetic Parameters
C_Ms0 = 0.786e6
C_Ki0 = 5.274e-4
C_Beta0 = 97.3e-15
# Electrical
C_VMTJ = -0.2
# Temperature dependent parameters
C_Msx = 1.73
C_Tc = 875
C_gammai = 2.18
C_gammab = 2.83
C_P0 = 0.58
C_asp = 2e-5
# MTJ resistance
C_phi = 0.4
C_RA = 5e3
C_F = 33141
C_Rp = (C_tox / (C_F * C_phi ** 0.5 * C_AMTJ)) * np.exp(
 (1.025 * C_tox * (2 * C_e * C_me * C_phi) ** 0.5) / C_hbar)
C_Vh = 0.5
C_TMR = 1.0
class Mtj:
 def __init__(self, step_size, t_max):
 # Mag field constants
 self.HPMA = None
 self.HVCMA = None
 self.HDEMAG = None
 self.temperature = None
 # Electrical
 self.Istt = None
 self.Jstt = None
 self.Hstt = None
 # Temperature dependent parameters
 self.Ms = None
 self.Ki = None
 self.Beta = None
 self.P = None
 # simulation initialization
 self.step_size = step_size
 self.t_max = t_max
 self.num_steps = round(self.t_max / self.step_size)
 self.m_initial = np.array([0, 0, 1])
 self.magnetization = np.zeros((self.num_steps, 3))
 self.time = np.zeros(self.num_steps)
 self.m = self.m_initial
 self.m_r = np.array([0, 0, 1])
 def Set_temp_params(self, temperature):
 self.temperature = temperature
 self.Ms = C_Ms0 * (1 - (self.temperature / C_Tc) ** C_Msx)
 self.Ki = C_Ki0 * (self.Ms / C_Ms0) ** C_gammai
 self.Beta = C_Beta0 * (self.Ms / C_Ms0) ** C_gammab
 self.P = C_P0 * (1 - C_asp * self.temperature ** 1.5)
 self.Hstt = (C_hbar * self.P) / (2 * C_e * C_mu0 * self.Ms * C_tf) * C_gamma
 def Set_mag_fields(self):
 self.HPMA = (2 * self.Ki) / (C_mu0 * self.Ms * C_tf)
 self.HVCMA = - (2 * self.Beta * abs(C_VMTJ)) / (C_mu0 * self.Ms * C_tox * C_tf)
 self.HDEMAG = -self.Ms * np.array([C_Nx, C_Ny, C_Nz])
 def calculate_Rmtj(self):
 self.Rmtj = (C_Rp * (1 + (C_VMTJ / C_Vh) ** 2 + C_TMR)) \
 / (1 + (C_VMTJ ** 2 / C_Vh ** 2) + C_TMR * (0.5 * (1 + self.m[2])))
 def llg(self):
 self.calculate_Rmtj()
 self.Istt = C_VMTJ / self.Rmtj
 self.Jstt = self.Istt / C_AMTJ
 sigmath = np.random.randn(3)
 sigmath = sigmath / np.linalg.norm(sigmath)
 Hthermal = np.sqrt((2 * C_kb * self.temperature * C_alpha) / (
 C_mu0 * self.Ms * C_gamma * C_vf * self.step_size)) * sigmath
 Hpma = np.array([0, 0, self.HPMA * self.m[2]])
 Hvcma = np.array([0, 0, self.HVCMA * self.m[2]])
 Hdemag = self.HDEMAG * self.m
 Heff = Hpma + Hdemag + C_Hex + Hvcma + Hthermal
 dmdt = -C_gamma_eff * np.cross(self.m, Heff) - C_alpha * C_gamma_eff * np.cross(self.m,
 np.cross(self.m,
 Heff)) - \
 self.Hstt * self.Jstt * np.cross(self.m, np.cross(self.m, self.m_r))
 return dmdt
 def run(self):
 for i in range(0, self.num_steps):
 dmdt = self.llg()
 self.m = self.m + self.step_size * dmdt
 self.m /= np.linalg.norm(self.m)
 self.magnetization[i] = self.m
 self.time[i] = i * self.step_size
 def plot(self):
 fig, ax = plt.subplots()
 ax.plot(self.time, self.magnetization[:, 0], label='Mx')
 ax.plot(self.time, self.magnetization[:, 1], label='My')
 ax.plot(self.time, self.magnetization[:, 2], label='Mz')
 ax.set_xlabel('Time (s)')
 ax.set_ylabel('Magnetization')
 ax.legend()
 plt.show()

Mixing initialisation and execution

Secondly you have a separate process for initialisating step sizes, which you do on the instance and a separate method for setting temperature parameters. You should set the temperature parameters at initialisation of the class.

Then I relabelled temperature names as constants internal to the class so that it was further obvious what was updated during a run.

class Mtj:
 def __init__(self, step_size, t_max, temperature):
 # Mag field constants
 self.HPMA = None
 self.HVCMA = None
 self.HDEMAG = None
 # Temperature
 self.temperature = temperature
 self.C_Ms = C_Ms0 * (1 - (self.temperature / C_Tc) ** C_Msx)
 self.C_Ki = C_Ki0 * (self.C_Ms / C_Ms0) ** C_gammai
 self.C_Beta = C_Beta0 * (self.C_Ms / C_Ms0) ** C_gammab
 self.C_P = C_P0 * (1 - C_asp * self.temperature ** 1.5)
 self.C_Hstt = (C_hbar * self.C_P) / (2 * C_e * C_mu0 * self.C_Ms * C_tf) * C_gamma
 # Electrical
 self.Istt = None
 self.Jstt = None
 # simulation initialization
 self.step_size = step_size
 self.t_max = t_max
 self.num_steps = round(self.t_max / self.step_size)
 self.m_initial = np.array([0, 0, 1])
 self.magnetization = np.zeros((self.num_steps, 3))
 self.time = np.zeros(self.num_steps)
 self.m = self.m_initial
 self.m_r = np.array([0, 0, 1])
 def Set_mag_fields(self):
 self.HPMA = (2 * self.C_Ki) / (C_mu0 * self.C_Ms * C_tf)
 self.HVCMA = - (2 * self.C_Beta * abs(C_VMTJ)) / (C_mu0 * self.C_Ms * C_tox * C_tf)
 self.HDEMAG = -self.C_Ms * np.array([C_Nx, C_Ny, C_Nz])
 def calculate_Rmtj(self):
 self.Rmtj = (C_Rp * (1 + (C_VMTJ / C_Vh) ** 2 + C_TMR)) \
 / (1 + (C_VMTJ ** 2 / C_Vh ** 2) + C_TMR * (0.5 * (1 + self.m[2])))
 def llg(self):
 self.calculate_Rmtj()
 self.Istt = C_VMTJ / self.Rmtj
 self.Jstt = self.Istt / C_AMTJ
 sigmath = np.random.randn(3)
 sigmath = sigmath / np.linalg.norm(sigmath)
 Hthermal = np.sqrt((2 * C_kb * self.temperature * C_alpha) / (
 C_mu0 * self.C_Ms * C_gamma * C_vf * self.step_size)) * sigmath
 Hpma = np.array([0, 0, self.HPMA * self.m[2]])
 Hvcma = np.array([0, 0, self.HVCMA * self.m[2]])
 Hdemag = self.HDEMAG * self.m
 Heff = Hpma + Hdemag + C_Hex + Hvcma + Hthermal
 dmdt = -C_gamma_eff * np.cross(self.m, Heff) - C_alpha * C_gamma_eff * np.cross(self.m,
 np.cross(self.m,
 Heff)) - \
 self.C_Hstt * self.Jstt * np.cross(self.m, np.cross(self.m, self.m_r))
 return dmdt

After doing this I could observe that set_mag_fields was just setting constants as well. So this was restructured into the initialisation.

class Mtj:
 def __init__(self, step_size, t_max, temperature):
 # Temperature
 self.temperature = temperature
 self.C_Ms = C_Ms0 * (1 - (self.temperature / C_Tc) ** C_Msx)
 self.C_Ki = C_Ki0 * (self.C_Ms / C_Ms0) ** C_gammai
 self.C_Beta = C_Beta0 * (self.C_Ms / C_Ms0) ** C_gammab
 self.C_P = C_P0 * (1 - C_asp * self.temperature ** 1.5)
 self.C_Hstt = (C_hbar * self.C_P) / (2 * C_e * C_mu0 * self.C_Ms * C_tf) * C_gamma
 # Mag field constants
 self.C_HPMA = (2 * self.C_Ki) / (C_mu0 * self.C_Ms * C_tf)
 self.C_HVCMA = - (2 * self.C_Beta * abs(C_VMTJ)) / (C_mu0 * self.C_Ms * C_tox * C_tf)
 self.C_HDEMAG = -self.C_Ms * np.array([C_Nx, C_Ny, C_Nz])

Standardising routines

I removed calculate_Rmtj. A better name for it would have been set_Rmtj but it was a one line function out of context with the rest of your code style.

Unfortunately I have to get back to work now.. But with the increased visibility it might be easier to look for further optimisations and any possible vectorisations. I am sure this can be improved further for performance gains.

Full Updated code

import numpy as np
from matplotlib import pyplot as plt
# Physical constants
C_mu0 = 4 * np.pi * 1e-7
C_gamma = 2.2127e5
C_kb = 1.38e-23
C_e = 1.6e-19
C_me = 9.1e-31
C_hbar = 1.054e-34
C_alpha = 0.05
C_gamma_eff = C_gamma / (1 + C_alpha ** 2)
# MTJ dimensions
C_D = 50e-9
C_tf = 1.1e-9
C_tox = 1.4e-9
C_AMTJ = np.pi * (C_D ** 2) / 4
C_vf = C_AMTJ * C_tf
# Demag tensors
C_Nx = (np.pi * C_tf) / (4 * C_D)
C_Ny = C_Nx
C_Nz = 1 - (2 * C_Nx)
# Mag field constants
C_HEX = -(5 / 1e4) / C_mu0
C_Hex = np.array([0, C_HEX, 0])
# C_HPMA = None
# C_HVCMA = None
# C_HDEMAG = None
# Magnetic Parameters
C_Ms0 = 0.786e6
C_Ki0 = 5.274e-4
C_Beta0 = 97.3e-15
# Electrical
C_VMTJ = -0.2
# Temperature dependent parameters
C_Msx = 1.73
C_Tc = 875
C_gammai = 2.18
C_gammab = 2.83
C_P0 = 0.58
C_asp = 2e-5
# MTJ resistance
C_phi = 0.4
C_RA = 5e3
C_F = 33141
C_Rp = (C_tox / (C_F * C_phi ** 0.5 * C_AMTJ)) * np.exp(
 (1.025 * C_tox * (2 * C_e * C_me * C_phi) ** 0.5) / C_hbar)
C_Vh = 0.5
C_TMR = 1.0
class Mtj:
 def __init__(self, step_size, t_max, temperature):
 # Temperature
 self.temperature = temperature
 self.C_Ms = C_Ms0 * (1 - (self.temperature / C_Tc) ** C_Msx)
 self.C_Ki = C_Ki0 * (self.C_Ms / C_Ms0) ** C_gammai
 self.C_Beta = C_Beta0 * (self.C_Ms / C_Ms0) ** C_gammab
 self.C_P = C_P0 * (1 - C_asp * self.temperature ** 1.5)
 self.C_Hstt = (C_hbar * self.C_P) / (2 * C_e * C_mu0 * self.C_Ms * C_tf) * C_gamma
 # Mag field constants
 self.C_HPMA = (2 * self.C_Ki) / (C_mu0 * self.C_Ms * C_tf)
 self.C_HVCMA = - (2 * self.C_Beta * abs(C_VMTJ)) / (C_mu0 * self.C_Ms * C_tox * C_tf)
 self.C_HDEMAG = -self.C_Ms * np.array([C_Nx, C_Ny, C_Nz])
 # Electrical
 self.Istt = None
 self.Jstt = None
 # simulation initialization
 self.step_size = step_size
 self.t_max = t_max
 self.num_steps = round(self.t_max / self.step_size)
 self.m_initial = np.array([0, 0, 1])
 self.magnetization = np.zeros((self.num_steps, 3))
 self.time = np.zeros(self.num_steps)
 self.m = self.m_initial
 self.m_r = np.array([0, 0, 1])
 def llg(self):
 self.Rmtj = (C_Rp * (1 + (C_VMTJ / C_Vh) ** 2 + C_TMR)) \
 / (1 + (C_VMTJ ** 2 / C_Vh ** 2) + C_TMR * (0.5 * (1 + self.m[2])))
 self.Istt = C_VMTJ / self.Rmtj
 self.Jstt = self.Istt / C_AMTJ
 sigmath = np.random.randn(3)
 sigmath = sigmath / np.linalg.norm(sigmath)
 Hthermal = np.sqrt((2 * C_kb * self.temperature * C_alpha) / (
 C_mu0 * self.C_Ms * C_gamma * C_vf * self.step_size)) * sigmath
 Hpma = np.array([0, 0, self.C_HPMA * self.m[2]])
 Hvcma = np.array([0, 0, self.C_HVCMA * self.m[2]])
 Hdemag = self.C_HDEMAG * self.m
 Heff = Hpma + Hdemag + C_Hex + Hvcma + Hthermal
 dmdt = -C_gamma_eff * np.cross(self.m, Heff) - C_alpha * C_gamma_eff * np.cross(self.m,
 np.cross(self.m,
 Heff)) - \
 self.C_Hstt * self.Jstt * np.cross(self.m, np.cross(self.m, self.m_r))
 return dmdt
 def run(self):
 for i in range(0, self.num_steps):
 dmdt = self.llg()
 self.m = self.m + self.step_size * dmdt
 self.m /= np.linalg.norm(self.m)
 self.magnetization[i] = self.m
 self.time[i] = i * self.step_size
 def plot(self):
 fig, ax = plt.subplots()
 ax.plot(self.time, self.magnetization[:, 0], label='Mx')
 ax.plot(self.time, self.magnetization[:, 1], label='My')
 ax.plot(self.time, self.magnetization[:, 2], label='Mz')
 ax.set_xlabel('Time (s)')
 ax.set_ylabel('Magnetization')
 ax.legend()
 plt.show()
mtj = Mtj(1e-12, 10e-9, 450)
mtj.run()
mtj.plot()

Actually you can now observe that these lines:

sigmath = np.random.randn(3)
sigmath = sigmath / np.linalg.norm(sigmath)
Hthermal = np.sqrt(
 (2 * C_kb * self.temperature * C_alpha) /
 (C_mu0 * self.C_Ms * C_gamma * C_vf * self.step_size)
) * sigmath

contain purely constants and random variables. You can extract this outside the class and create one large vectorised array rather than calculating it repeatedly in each loop.

Profiling

np.cross accounts for 60% of the profiling speed.

enter image description here

It runs 50,000 times. You can observe that one of the equations does not need to be calculated twice: np.cross(self.m, Heff) can be precalculated and substitued in, which will reduce to 40,000 times.

You may also find that your triple vector cross products are more rapidly solved by the alternative formulation. But for this you would have to experiment.

enter image description here

Edited with suggestions

I took the constants and random variables to initialisation and also reduced the cross product computations to 40,000. Speed improved:

enter image description here

I also used the alternative formulation for the triple cross product as follows:

 m_dot_m = np.dot(self.m, self.m)
 m_dot_Heff = np.dot(self.m, Heff)
 m_dot_r = np.dot(self.m, self.m_r)
 m_x_m_x_Heff = self.m * m_dot_Heff - Heff * m_dot_m
 m_x_m_x_r = self.m * m_dot_r - self.m_r * m_dot_m
 cross_m_Heff = np.cross(self.m, Heff)
 dmdt = -C_gamma_eff * cross_m_Heff - C_alpha * C_gamma_eff * \
 m_x_m_x_Heff - \
 self.C_Hstt * self.Jstt * m_x_m_x_r

This was very effective, notice the time for llg

enter image description here

The real gains

You mentioned that you need to do this 10,000 times. You cannot vectorize or parallelize the above because the operations need to be performed in sequence.

However you can, and should, vectorize the run of 10,000 of these simulations. For example the array magnetization has dimension (num_steps, 3), but you aim to create a 3D tensor which is of dimension (num_steps, 3, 10000).

That is why I reformulated the equation using dot products, so that later you can vectorize all of these formulae to obtain the speed from numpy's C implementation.

Thus far


import numpy as np
from matplotlib import pyplot as plt
# Physical constants
C_mu0 = 4 * np.pi * 1e-7
C_gamma = 2.2127e5
C_kb = 1.38e-23
C_e = 1.6e-19
C_me = 9.1e-31
C_hbar = 1.054e-34
C_alpha = 0.05
C_gamma_eff = C_gamma / (1 + C_alpha ** 2)
# MTJ dimensions
C_D = 50e-9
C_tf = 1.1e-9
C_tox = 1.4e-9
C_AMTJ = np.pi * (C_D ** 2) / 4
C_vf = C_AMTJ * C_tf
# Demag tensors
C_Nx = (np.pi * C_tf) / (4 * C_D)
C_Ny = C_Nx
C_Nz = 1 - (2 * C_Nx)
# Mag field constants
C_HEX = -(5 / 1e4) / C_mu0
C_Hex = np.array([0, C_HEX, 0])
# C_HPMA = None
# C_HVCMA = None
# C_HDEMAG = None
# Magnetic Parameters
C_Ms0 = 0.786e6
C_Ki0 = 5.274e-4
C_Beta0 = 97.3e-15
# Electrical
C_VMTJ = -0.2
# Temperature dependent parameters
C_Msx = 1.73
C_Tc = 875
C_gammai = 2.18
C_gammab = 2.83
C_P0 = 0.58
C_asp = 2e-5
# MTJ resistance
C_phi = 0.4
C_RA = 5e3
C_F = 33141
C_Rp = (C_tox / (C_F * C_phi ** 0.5 * C_AMTJ)) * np.exp(
 (1.025 * C_tox * (2 * C_e * C_me * C_phi) ** 0.5) / C_hbar)
C_Vh = 0.5
C_TMR = 1.0
class Mtj:
 def __init__(self, step_size, t_max, temperature):
 # Temperature
 self.temperature = temperature
 self.C_Ms = C_Ms0 * (1 - (self.temperature / C_Tc) ** C_Msx)
 self.C_Ki = C_Ki0 * (self.C_Ms / C_Ms0) ** C_gammai
 self.C_Beta = C_Beta0 * (self.C_Ms / C_Ms0) ** C_gammab
 self.C_P = C_P0 * (1 - C_asp * self.temperature ** 1.5)
 self.C_Hstt = (C_hbar * self.C_P) / (2 * C_e * C_mu0 * self.C_Ms * C_tf) * C_gamma
 # Mag field constants
 self.C_HPMA = (2 * self.C_Ki) / (C_mu0 * self.C_Ms * C_tf)
 self.C_HVCMA = - (2 * self.C_Beta * abs(C_VMTJ)) / (C_mu0 * self.C_Ms * C_tox * C_tf)
 self.C_HDEMAG = -self.C_Ms * np.array([C_Nx, C_Ny, C_Nz])
 # Electrical
 self.Istt = None
 self.Jstt = None
 # simulation initialization
 self.step_size = step_size
 self.t_max = t_max
 self.num_steps = round(self.t_max / self.step_size)
 self.m_initial = np.array([0, 0, 1])
 self.magnetization = np.zeros((self.num_steps, 3))
 self.time = np.zeros(self.num_steps)
 self.m = self.m_initial
 self.m_r = np.array([0, 0, 1])
 sigmath = np.random.randn(3, self.num_steps)
 sigmath = sigmath / np.linalg.norm(sigmath, axis=0)
 self.C_Hthermal = np.sqrt((2 * C_kb * self.temperature * C_alpha) / (
 C_mu0 * self.C_Ms * C_gamma * C_vf * self.step_size)) * sigmath
 def llg(self, i):
 self.Rmtj = (C_Rp * (1 + (C_VMTJ / C_Vh) ** 2 + C_TMR)) \
 / (1 + (C_VMTJ ** 2 / C_Vh ** 2) + C_TMR * (0.5 * (1 + self.m[2])))
 self.Istt = C_VMTJ / self.Rmtj
 self.Jstt = self.Istt / C_AMTJ
 Hpma = np.array([0, 0, self.C_HPMA * self.m[2]])
 Hvcma = np.array([0, 0, self.C_HVCMA * self.m[2]])
 Hdemag = self.C_HDEMAG * self.m
 Heff = Hpma + Hdemag + C_Hex + Hvcma + self.C_Hthermal[:, i]
 m_dot_m = np.dot(self.m, self.m)
 m_dot_Heff = np.dot(self.m, Heff)
 m_dot_r = np.dot(self.m, self.m_r)
 m_x_m_x_Heff = self.m * m_dot_Heff - Heff * m_dot_m
 m_x_m_x_r = self.m * m_dot_r - self.m_r * m_dot_m
 cross_m_Heff = np.cross(self.m, Heff)
 dmdt = -C_gamma_eff * cross_m_Heff - C_alpha * C_gamma_eff * \
 m_x_m_x_Heff - \
 self.C_Hstt * self.Jstt * m_x_m_x_r
 return dmdt
 def run(self):
 for i in range(0, self.num_steps):
 dmdt = self.llg(i)
 self.m = self.m + self.step_size * dmdt
 self.m /= np.linalg.norm(self.m)
 self.magnetization[i] = self.m
 self.time[i] = i * self.step_size
 def plot(self):
 fig, ax = plt.subplots()
 ax.plot(self.time, self.magnetization[:, 0], label='Mx')
 ax.plot(self.time, self.magnetization[:, 1], label='My')
 ax.plot(self.time, self.magnetization[:, 2], label='Mz')
 ax.set_xlabel('Time (s)')
 ax.set_ylabel('Magnetization')
 ax.legend()
 plt.show()
mtj = Mtj(1e-12, 10e-9, 450)
mtj.run()
mtj.plot()

Takes less than one second to run on my end. A significant improvement in terms of speed.
Thanks! Actually using explcit cross product made it even faster, I precalculated the random number array, now it runs is about 0.5 seconds. Could you tell me how you got that screenshot of benchmark statistics?
I used the profiling tool in the professional version of Pycharm, which uses the cProfile profiler.
Choosing a "this is a constant" prefix is a design choice, that’s cool. But please prefer lowercase "c_" over the proposed "C_", out of respect for PEP-8. (I gave the OP a pass on most naming details, due to very helpful adherence to math conventions which are a bit different from python conventions.) Yeah, yeah, I know, manifest constants are typically upcased. But here, mixed case sometimes makes an identifier more readable, more familiar.
Pep8 module level constants are usually all caps. The only reason I didnt use all caps is it was easier to stick with the OP original naming for comparison and when editing to avoid errors. I prefer the capital C because it is more obvious and it better suited the task presented here and made the answer clearer to me and, in my opinion, to most.

atommy atommy 411 bronze badge · Answer 2 · 2024-05-16 08:39:22Z

I'm not an expert but let me raise very common tips that make a calculation faster and that you might be already aware of.

Decreasing the number of function calls

You call llg() 10000 times in the loop, which then calls calculate_Rmtj() in it. Calling a function is slow. Did you try expanding llg() and calculate_Rmtj() directly in the loop? (yes that will make your code messy). Also, you call randn(), cross() and norm() in the loop. Some of them can be slow. It looks like randn() can be called before the loop, i.e., building (10000, 3) array by randn(10000, 3) and then you can access elements in the loop.

Using a JIT compiler

If you want to make your function work faster, a JIT compiler might be the thing. numba is a very popular package for this purpose. By using it, you can for example make llg() run a lot faster (by orders of magnitude). But you need to modify your code.

Using local variables

Using local variables are generally faster way. I'm not very sure about this but accessing self.something from a function scope over and over looks not economical. You can try to pass them to a function as arguments.

J_H J_H 41.7k3 gold badges38 silver badges157 bronze badges · Answer 3 · 2024-05-15 18:39:42Z

physical constants

Thank you for using MKS, it's all very clear.

 self.e = 1.6e-19
 self.me = 9.1e-31

Number of coulombs is correct to three decimal places. But it's just too easy to specify it to five, in keeping with the γ sig figs: 1.6022 Similarly 9.1094 for mass, which had just two sig figs.

I can see some small attraction to always having self.this and self.that, but still, choosing to allocate new storage for these on each object instance is an odd modeling choice. Consider putting such constants at top-level of this or another module. Consider relying on a pypi package which already offers them, such as astropy or scipy:

from scipy.constants import e, m_e, mu_0

When we examine a particular "constant", this (modern) OP code has the curious artifact of relying on the value of mu_0 that was used from 1948 through 2018. The redefinition of 2019 altered its value to the slightly different one we see in the scipy package. This was due to ditching the platinum kilogram prototype, which regrettably was starting to exhibit discrepancies of ~ 50 μg.

tl;dr: These details are boring and not part of the core contribution of this work; outsource them so you needn't worry about them.

naming

These are distinct identifiers:

 self.HEX = -(5 / 1e4) / self.mu0
 self.Hex = np.array([0, self.HEX, 0])

But I do wish you'd pick a slightly different name for that first one. Sometimes colleagues discuss code over the telephone. Try to choose names that will sound distinct when recited aloud, or when recited mentally during quiet code reading.

 def Set_temp_params(self, temperature):
 ...
 def Set_mag_fields(self):

Pep-8 asked you nicely to please use a lowercase set_ prefix for such methods.

cite a reference

 def llg(self):

The Review Context made it clear enough that this is the Landau-Lifschitz-Gilbert equation, and I thank you for that. But the source code really needs a """docstring""" or at least a # comment to that effect.

Citing some author would also shed a great deal of light on F, Rmtj, and the various other expressions. As written this code is on the opaque side. I do appreciate that it is trying hard to conform to math notation conventions.

profiling

This submission would have benefited from the inclusion of profiling measurements.

speed

We pick a random vector in 3-space, good.

 def llg(self):
 ...
 sigmath = np.random.randn(3)

But then we run() it \10ドル^4\$ times. There's an opportunity to roll all the random numbers at once here, and let numpy / BLAS worry about looping.

Or if you wish to conserve memory, pick some compromise, perhaps by rolling 300 numbers at a time and doing a hundred runs through this.

If you don't go that route, consider adding type annotation which mypy lints clean on, and then giving this method numba's @njit decorator.

You're computing nine temp vars, each of them changing slightly on every iteration. It's not clear to me how big the partial derivative is on each of them. If some of them are fairly stable, changing by some tiny epsilon with every iteration, consider hoisting them out of this function, that is, hoisting them out of the loop. So if we do ten loop unrolls, we compute just once at top of unroll a smallish change which is 10x bigger than what OP code computes, rather than computing order-of-magnitude smaller values ten times. Then we still need to loop back \10ドル^3\$ times, an improvement over \10ドル^4\$ times.

Thank you I will try to incorporate them. Another thing I noticed is that the code became about 5-6x faster when I used an excplicit cross product instead of the numpy builtin one. Would you happen to know why?
Is there a way I can put my new code somehwere and we can continue this discussion? Should I post the new code in this question itself?
When you take the tour, it describes a couple of possible outcomes. For example, you might choose to post a self Answer, or create a related new Question.

greybeard greybeard 7,4313 gold badges21 silver badges55 bronze badges · Answer 4 · 2024-05-16 04:56:24Z

 self.HPMA = (2 * self.Ki) / (self.mu0 * self.Ms * self.tf)
 self.HVCMA = - (2 * self.Beta * abs(self.VMTJ)) / (self.mu0 * self.Ms * self.tox * self.tf)

With interpreted languages, I'm never sure how much common subexpression elimination to expect.

 def Set_temp_params(self, temperature):
 self.temperature = temperature
 temperature_coefficient = 1 - (self.temperature / self.Tc) ** self.Msx
 self.Ms = self.Ms0 * temperature_coefficient 
 self.Ki = self.Ki0 * temperature_coefficient ** self.gamma_i
 self.Beta = self.Beta0 * temperature_coefficient ** self.gamma_b
 ...
 def Set_mag_fields(self):
 fudge = 2 / (self.mu0 * self.Ms * self.tf)
 self.HPMA = self.Ki * fudge
 self.HVCMA = -(self.Beta * abs(self.VMTJ)) / self.tox * fudge
 ...

should be algebraically equivalent, and the expression for HVCMA isn't overly long any more.
(I noticed another instance of self.mu0 * self.Ms in llg().)

"I'm never sure how much common subexpression elimination" last I checked python doesn't have it, period. Which makes sense when you consider all the types are statically unknown so operators can be arbitrary functions.

Kate Kate 8,1089 silver badges21 bronze badges · Answer 5 · 2024-05-15 19:08:28Z

I am not very qualified to comment on your algorithm, so this is naive review from the POV of developer, but here is some generic advice that can be used in any situation where you want to optimize slow code, or identify bottlenecks.

I often use the codetiming package for this purpose.

So I have applied decorators on each function like:

@Timer(name="llg", text="Ran in {:0.8f} seconds", logger=logger.debug)
def llg(self):
 ....

And then this code to produce a breakdown by function:

logger.info("Timing summary")
logger.info("--------------")
total_time = sum(Timer.timers.values())
logger.info(f"Total time: {total_time:.8f}")
logger.info("name time cost ")
logger.info("------------------------------ ---------- ------")
for k, v in Timer.timers.items():
 logger.info(f"{k:30} {v:.8f} {(v / total_time):.2%}")

Result:

2024年05月15日 18:30:44,361 - INFO - L163 - Timing summary
2024年05月15日 18:30:44,361 - INFO - L164 - --------------
2024年05月15日 18:30:44,361 - INFO - L166 - Total time: 7.78605968
2024年05月15日 18:30:44,361 - INFO - L167 - name time cost 
2024年05月15日 18:30:44,361 - INFO - L168 - ------------------------------ ---------- ------
2024年05月15日 18:30:44,361 - INFO - L170 - __init__ 0.00012388 0.00%
2024年05月15日 18:30:44,361 - INFO - L170 - Set_temp_params 0.00000414 0.00%
2024年05月15日 18:30:44,361 - INFO - L170 - Set_mag_fields 0.00001162 0.00%
2024年05月15日 18:30:44,361 - INFO - L170 - calculate_Rmtj 0.02407876 0.31%
2024年05月15日 18:30:44,361 - INFO - L170 - llg 3.64657636 46.83%
2024年05月15日 18:30:44,361 - INFO - L170 - run 4.11526491 52.85%

This is a quick and convenient way to pinpoint slow code. And then you can dig further by wrapping selected blocks of code like this:

with Timer(name="some code", text="Ran in {:0.8f} seconds", logger=logger.info):
 # do something
 ...

Check for example the block in function llg that has np.cross...

The second strategy is that you can cache the results of expensive operations. The docs have a basic example about it:

@cache
def factorial(n):
 return n * factorial(n-1) if n else 1
>>> factorial(10) # no previously cached result, makes 11 recursive calls
3628800
>>> factorial(5) # just looks up cached value result
120
>>> factorial(12) # makes two new recursive calls, the other 10 are cached
479001600

As long as the arguments are "hashable" you should be able to apply that logic to any purpose. Some variables do not change over the course of program execution, so if you find yourself repeatedly calculating the same values over and over, this "trick" will save a lot of precious (CPU) time.

You will quite likely have to refactor your code to benefit from this approach. You would need to pass arguments to your functions rather than rely on self.whatever. I find that approach bug-prone, because it's not immediately clear where and how the values are instantiated. Or you could use setters and getters.

But this is not my code and I have my own way of doing things so...

In functions like Set_mag_fields where you set three class attributes:

def Set_mag_fields(self):
 self.HPMA = (2 * self.Ki) / (self.mu0 * self.Ms * self.tf)
 self.HVCMA = - (2 * self.Beta * abs(self.VMTJ)) / (self.mu0 * self.Ms * self.tox * self.tf)
 self.HDEMAG = -self.Ms * np.array([self.Nx, self.Ny, self.Nz])

You could return a namedtuple instead, if that makes sense.

Misc: as per PEP8 constants and magic numbers should be uppercase.

The class could perfectly be written as dataclass.

Stack Exchange Network

How do I speed up this simulation program

5 Answers 5

Specifically indicating what are constants.

Mixing initialisation and execution

Standardising routines

Full Updated code

Profiling

Edited with suggestions

The real gains

Thus far

Decreasing the number of function calls

Using a JIT compiler

Using local variables

physical constants

naming

cite a reference

profiling

speed

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How do I speed up this simulation program

5 Answers 5

Specifically indicating what are constants.

Mixing initialisation and execution

Standardising routines

Full Updated code

Profiling

Edited with suggestions

The real gains

Thus far

Decreasing the number of function calls

Using a JIT compiler

Using local variables

physical constants

naming

cite a reference

profiling

speed

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions