Optical dispersion calculation from spectrograms with Python

Question 1

First, I'd like to provide a little explanation on what my code is supposed to do. It is part of a middle-sized project. I restructured the code to work on its own, and also added little comments to help you understand what I'm doing. This is one of the 5 methods how spectrograms can be analyzed. Now let me give you a little physical background (I won't go in the details):

A spectrogram is a dataset, x values mean frequency, y values mean intensity. We either have it on it's own, or we can normalize it (if sam and ref args are given). _handle_input function below takes care of reading the given inputs right. In this example dataset, we have it normalized. The first step is to determine the minimum and maximum points in the given dataset: we can manually feed the minimum and maximum places through minx and maxx arguments. If it's not used, we will use scipy's argrelextrema as default. This example we will not over-complicate things, so let's use the defaults.

In the next step we need to specify a reference point (ref_point arg, preferably somewhere in the middle of x values), and get the relative positions of minimums and maximums to that point. Then we define an order: This figure helps

The closest extremal points are 1st order, the 2nd closest points are 2nd order, etc. Then, we multiply the order by pi, and that will be the y value of the spectral phase (which we want to calculate). See this picture

Now, our job to fit a polynomial to the spectral phase graph (the lower graph on the picture). The order varies from 1 to 5. From the fitted parameters then we can calculate the dispersion's coefficients, which is the purpose of the whole method. The calculation is described at the poly's docstrings. For example:

def poly_fit5(x, b0, b1, b2, b3, b4, b5):
 """
 Taylor polynomial for fit
 b1 = GD
 b2 = GDD / 2
 b3 = TOD / 6
 b4 = FOD / 24
 b5 = QOD / 120
 """
 return b0+b1*x+b2*x**2+b3*x**3+b4*x**4+b5*x**5

We need the GD, GDD, etc., so the method I wrote returns these params, not the fitted parameters. I also added a decorator called show_disp, which immediately prints the results in the right format.

Here is the full code: Please, first download the dataset above, scroll down and uncomment the given examples to see how it's working.

import operator
from functools import wraps
from math import factorial
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.signal import argrelextrema
try:
 from lmfit import Model
 _has_lmfit = True
except ImportError:
 _has_lmfit = False
__all__ = ['min_max_method']
def find_nearest(array, value):
 array = np.asarray(array)
 idx = (np.abs(value - array)).argmin()
 return array[idx], idx
def show_disp(f):
 #decorator to immediately print out the dispersion results
 @wraps(f)
 def wrapping(*args, **kwargs):
 disp, disp_std, stri = f(*args, **kwargs)
 labels = ['GD', 'GDD','TOD', 'FOD', 'QOD']
 for i in range(len(labels)):
 print(labels[i] + ' = ' + str(disp[i]) + ' +/- ' + str(disp_std[i]) + ' 1/fs^{}'.format(i+1))
 return disp, disp_std, stri
 return wrapping
def _handle_input(init_x, init_y, ref, sam):
 """
 Instead of handling the inputs in every function, there is this method.
 Parameters
 ----------
 init_x: array-like
 x-axis data
 init_y: array-like
 y-axis data
 ref, sam: array-like
 reference and sample arm spectrum evaluated at init_x
 Returns
 -------
 init_x: array-like
 unchanged x data
 y_data: array-like
 the proper y data
 """
 if (len(init_x) > 0) and (len(init_y) > 0) and (len(sam) > 0):
 y_data = (init_y-ref-sam)/(2*np.sqrt(ref*sam))
 elif (len(ref) == 0) or (len(sam) == 0):
 y_data = init_y
 elif len(init_y) == 0 or len(init_x) == 0:
 raise ValueError('Please load the spectrum!\n')
 else:
 raise ValueError('Input shapes are wrong.\n')
 return init_x, y_data
@show_disp
def min_max_method(init_x, init_y, ref, sam, ref_point, maxx=None, minx=None, fit_order=5, show_graph=False):
 """
 Calculates the dispersion with minimum-maximum method 
 Parameters
 ----------
 init_x: array-like
 x-axis data
 init_y: array-like
 y-axis data
 ref, sam: array-like
 reference and sample arm spectra evaluated at init_x
 ref_point: float
 the reference point to calculate order
 maxx and minx: array-like
 the accepted minimal and maximal places should be passed to these args
 fit_order: int
 degree of polynomial to fit data [1, 5]
 show_graph: bool
 if True returns a matplotlib plot and pauses execution until closing the window
 Returns
 -------
 dispersion: array-like
 [GD, GDD, TOD, FOD, QOD]
 dispersion_std: array-like
 [GD_std, GDD_std, TOD_std, FOD_std, QOD_std]
 fit_report: lmfit report (if available)
 """
 x_data, y_data = _handle_input(init_x, init_y, ref, sam)
 # handling default case when no minimum or maximum coordinates are given
 if maxx is None:
 maxInd = argrelextrema(y_data, np.greater)
 maxx = x_data[maxInd]
 if minx is None:
 minInd = argrelextrema(y_data, np.less)
 minx = x_data[minInd]
 _, ref_index = find_nearest(x_data, ref_point)
 # getting the relative distance of mins and maxs coordinates from the ref_point
 relNegMaxFreqs = np.array([a for a in (x_data[ref_index]-maxx) if a<0])
 relNegMinFreqs= np.array([b for b in (x_data[ref_index]-minx) if b<0])
 relNegFreqs = sorted(np.append(relNegMaxFreqs, relNegMinFreqs))
 # reversing order because of the next loop
 relNegFreqs = relNegFreqs[::-1] 
 relPosMaxFreqs = np.array([c for c in (x_data[ref_index]-maxx) if c>0])
 relPosMinFreqs= np.array([d for d in (x_data[ref_index]-minx) if d>0])
 relPosFreqs = sorted(np.append(relPosMinFreqs,relPosMaxFreqs))
 negValues = np.zeros_like(relNegFreqs)
 posValues = np.zeros_like(relPosFreqs)
 for freq in range(len(relPosFreqs)):
 posValues[freq] = np.pi*(freq+1)
 for freq in range(len(relNegFreqs)):
 negValues[freq] = np.pi*(freq+1)
 # making sure the data is ascending in x
 # FIXME: Do we even need this?
 x_s = np.append(relPosFreqs, relNegFreqs) 
 y_s = np.append(posValues, negValues)
 L = sorted(zip(x_s,y_s), key=operator.itemgetter(0))
 fullXValues, fullYValues = zip(*L)
 # now we have the data, let's fit a function
 if _has_lmfit:
 if fit_order == 5:
 fitModel = Model(poly_fit5)
 params = fitModel.make_params(b0 = 0, b1 = 1, b2 = 1, b3 = 1, b4 = 1, b5 = 1)
 result = fitModel.fit(fullYValues, x=fullXValues, params=params, method='leastsq') 
 elif fit_order == 4:
 fitModel = Model(poly_fit4)
 params = fitModel.make_params(b0 = 0, b1 = 1, b2 = 1, b3 = 1, b4 = 1)
 result = fitModel.fit(fullYValues, x=fullXValues, params=params, method='leastsq') 
 elif fit_order == 3:
 fitModel = Model(poly_fit3)
 params = fitModel.make_params(b0 = 0, b1 = 1, b2 = 1, b3 = 1)
 result = fitModel.fit(fullYValues, x=fullXValues, params=params, method='leastsq') 
 elif fit_order == 2:
 fitModel = Model(poly_fit2)
 params = fitModel.make_params(b0 = 0, b1 = 1, b2 = 1)
 result = fitModel.fit(fullYValues, x=fullXValues, params=params, method='leastsq') 
 elif fit_order == 1:
 fitModel = Model(poly_fit1)
 params = fitModel.make_params(b0 = 0, b1 = 1)
 result = fitModel.fit(fullYValues, x=fullXValues, params=params, method='leastsq') 
 else:
 raise ValueError('Order is out of range, please select from [1,5]')
 else:
 if fit_order == 5:
 popt, pcov = curve_fit(poly_fit5, fullXValues, fullYValues, maxfev = 8000)
 _function = poly_fit5
 elif fit_order == 4:
 popt, pcov = curve_fit(poly_fit4, fullXValues, fullYValues, maxfev = 8000)
 _function = poly_fit4
 elif fit_order == 3:
 popt, pcov = curve_fit(poly_fit3, fullXValues, fullYValues, maxfev = 8000)
 _function = poly_fit3
 elif fit_order == 2:
 popt, pcov = curve_fit(poly_fit2, fullXValues, fullYValues, maxfev = 8000)
 _function = poly_fit2
 elif fit_order == 1:
 popt, pcov = curve_fit(poly_fit1, fullXValues, fullYValues, maxfev = 8000)
 _function = poly_fit1
 else:
 raise ValueError('Order is out of range, please select from [1,5]')
 try:
 if _has_lmfit:
 dispersion, dispersion_std = [], []
 for name, par in result.params.items():
 dispersion.append(par.value)
 dispersion_std.append(par.stderr)
 # dropping b0 param, it's not needed
 dispersion = dispersion[1:]
 dispersion_std = dispersion_std[1:]
 for idx in range(len(dispersion)):
 dispersion[idx] = dispersion[idx] * factorial(idx+1) 
 dispersion_std[idx] = dispersion_std[idx] * factorial(idx+1)
 # fill up the rest with zeros
 while len(dispersion) < 5:
 dispersion.append(0)
 dispersion_std.append(0) 
 fit_report = result.fit_report()
 else:
 fullXValues = np.asarray(fullXValues) #for plotting
 dispersion, dispersion_std = [], []
 for idx in range(len(popt)-1):
 dispersion.append(popt[idx+1]*factorial(idx+1))
 while len(dispersion)<5:
 dispersion.append(0)
 while len(dispersion_std)<len(dispersion):
 dispersion_std.append(0)
 fit_report = '\nTo display detailed results, you must have lmfit installed.'
 if show_graph:
 fig = plt.figure(figsize=(7,7))
 fig.canvas.set_window_title('Min-max method fitted')
 plt.plot(fullXValues, fullYValues, 'o', label='dataset')
 try:
 plt.plot(fullXValues, result.best_fit, 'r--', label='fitted')
 except Exception:
 plt.plot(fullXValues, _function(fullXValues, *popt), 'r--', label='fitted')
 plt.legend()
 plt.grid()
 plt.show()
 return dispersion, dispersion_std, fit_report
 #This except block exists because otherwise errors would crash PyQt5 app, and we can display the error 
 #to the log dialog in the application.
 except Exception as e: 
 return [], [], e
def poly_fit5(x, b0, b1, b2, b3, b4, b5):
 """
 Taylor polynomial for fit
 b1 = GD
 b2 = GDD / 2
 b3 = TOD / 6
 b4 = FOD / 24
 b5 = QOD / 120
 """
 return b0+b1*x+b2*x**2+b3*x**3+b4*x**4+b5*x**5
def poly_fit4(x, b0, b1, b2, b3, b4):
 """
 Taylor polynomial for fit
 b1 = GD
 b2 = GDD / 2
 b3 = TOD / 6
 b4 = FOD / 24
 """
 return b0+b1*x+b2*x**2+b3*x**3+b4*x**4
def poly_fit3(x, b0, b1, b2, b3):
 """
 Taylor polynomial for fit
 b1 = GD
 b2 = GDD / 2
 b3 = TOD / 6
 """
 return b0+b1*x+b2*x**2+b3*x**3
def poly_fit2(x, b0, b1, b2):
 """
 Taylor polynomial for fit
 b1 = GD
 b2 = GDD / 2
 """
 return b0+b1*x+b2*x**2
def poly_fit1(x, b0, b1):
 """
 Taylor polynomial for fit
 b1 = GD
 """
 return b0+b1*x
## To test with the data I provided:
# a,b,c,d = np.loadtxt('test.txt', unpack=True, delimiter=',')
# min_max_method(a,b,c,d, 2.5, fit_order=2, show_graph=True)
## To see how a usual dataset looks like:
# x, y = _handle_input(a,b,c,d)
# plt.plot(x, y)
# plt.grid()
# plt.show()

Obviously there are many things to improve (ex. determining the fit order looks horrible, but I could not find a better way.) I hope this was not too hard to follow. If something's unclear, just ask. I appreciate every suggestion in the code.

EDIT:

fixed

dispersion[idx] = dispersion[idx] / factorial(idx+1) 
dispersion_std[idx] = dispersion_std[idx] / factorial(idx+1)
...
dispersion.append(popt[idx+1]/factorial(idx+1))

to

dispersion[idx] = dispersion[idx] * factorial(idx+1) 
dispersion_std[idx] = dispersion_std[idx] * factorial(idx+1)
...
dispersion.append(popt[idx+1]*factorial(idx+1))

Question 2

Loop like a native

 labels = ['GD', 'GDD','TOD', 'FOD', 'QOD']
 for i in range(len(labels)):
 print(labels[i] + ' = ' + str(disp[i]) + ' +/- ' + str(disp_std[i]) + ' 1/fs^{}'.format(i+1))

can be improved.

labels = ('GD', 'GDD','TOD', 'FOD', 'QOD') # Immutable tuple
for i, (label, disp_item, disp_std_item) in enumerate(zip(labels, disp, disp_std)):
 print(f'{label} = {disp_item} ±{disp_std_item} 1/fs^{i+1}')

Don't repeat yourself

 if fit_order == 5:
 fitModel = Model(poly_fit5)
 params = fitModel.make_params(b0 = 0, b1 = 1, b2 = 1, b3 = 1, b4 = 1, b5 = 1)
 result = fitModel.fit(fullYValues, x=fullXValues, params=params, method='leastsq')

etc. should probably be

fit_model = Model(poly_fit)
params = fit_model.make_params(
 b0 = 0,
 **{f'b{i}': 1 for i in range(1, fit_order+1)}
)
result = fitModel.fit(fullYValues, x=fullXValues, params=params, method='leastsq')

where poly_fit is itself generalized:

def poly_fit(x, *args):
 return sum(b*x**i for i, b in enumerate(args))

That said, numpy has better and faster ways to do this.

Question 3

Thank you for answering. I know that np.polyfit is a faster (and maybe easier) way to achieve the same result, but usually there is not too many points to fit, so speed is not an issue there. Also the fit method may be changed later, so to me it's nice to have lmfit.

score 2 · Answer 1 · 2019-10-24 02:36:59Z

Loop like a native

 labels = ['GD', 'GDD','TOD', 'FOD', 'QOD']
 for i in range(len(labels)):
 print(labels[i] + ' = ' + str(disp[i]) + ' +/- ' + str(disp_std[i]) + ' 1/fs^{}'.format(i+1))

can be improved.

labels = ('GD', 'GDD','TOD', 'FOD', 'QOD') # Immutable tuple
for i, (label, disp_item, disp_std_item) in enumerate(zip(labels, disp, disp_std)):
 print(f'{label} = {disp_item} ±{disp_std_item} 1/fs^{i+1}')

Don't repeat yourself

 if fit_order == 5:
 fitModel = Model(poly_fit5)
 params = fitModel.make_params(b0 = 0, b1 = 1, b2 = 1, b3 = 1, b4 = 1, b5 = 1)
 result = fitModel.fit(fullYValues, x=fullXValues, params=params, method='leastsq')

etc. should probably be

fit_model = Model(poly_fit)
params = fit_model.make_params(
 b0 = 0,
 **{f'b{i}': 1 for i in range(1, fit_order+1)}
)
result = fitModel.fit(fullYValues, x=fullXValues, params=params, method='leastsq')

where poly_fit is itself generalized:

def poly_fit(x, *args):
 return sum(b*x**i for i, b in enumerate(args))

That said, numpy has better and faster ways to do this.

Thank you for answering. I know that np.polyfit is a faster (and maybe easier) way to achieve the same result, but usually there is not too many points to fit, so speed is not an issue there. Also the fit method may be changed later, so to me it's nice to have lmfit.

Stack Exchange Network

Optical dispersion calculation from spectrograms with Python

1 Answer 1

Loop like a native

Don't repeat yourself

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Optical dispersion calculation from spectrograms with Python

1 Answer 1

Loop like a native

Don't repeat yourself

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions