I have a data output file from a LabVIEW program (that I cannot modify), and I need to plot the data and fit to a theoretical curve. I accomplished this with Python (with which I am entirely self-taught).
My code runs and produces the desired output just fine, but I feel that it is much too messy, unreadable, and hacked together, not to mention that it was time-consuming to produce (especially for a task as common as reading and plotting data). What code smells are present, and is there a faster/preferred way to do this?
What needs to be plotted?
From the data file (below), there are 4 distinct values of HWP B
: 0, 45, 90, and 135. With this angle held constant, HWP A
(which varies from 0 to 180), must be plotted on the x-axis against a function of the columns A
, B
, and AB
(namely, AB - 2*A*B*dt
, where dt
is a constant). The following plot is produced, as well as 4 lines which are used later by me.
(For those wondering, this data follows from an experiment outlined by Dehlinger and Mitchell in a paper on entanglement available on the arXiv.)
Output plot showing 4 data series and their fits.
Code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
''' Read, analyze and plot data from EPR state quantification data file '''
# time interval for data collection
dt = 5e-9
# read in data from file
fname_epr_quant = "qie-epr-quant.dat"
with open(fname_epr_quant,'r') as f:
eq_dat = np.array([ line.split(', ') for line in list(filter(None,f.read().split('\n'))) ])
# create dictionary for the data so that it can be referenced by name
varnames = eq_dat[0]
eq_n2i = { name:index for index,name in enumerate(varnames)} #name-to-index
eq_dict = {}
for l_line in eq_dat[1:]:
seriesname = l_line[ eq_n2i['Comments:'] ]
if seriesname not in eq_dict.keys():
eq_dict[seriesname] = { varname:np.array([]) for varname in varnames }
for l_val,l_varname in zip(l_line,varnames):
try:
value = float(l_val)
except:
value = l_val
eq_dict[seriesname][l_varname] = np.append(eq_dict[seriesname][l_varname], value)
# compute 2 new quantities from the data ('acc' and 'AB-acc')
# given that accidentals = 2*A*B*dt
for l_seriesname,l_series in eq_dict.items():
accidentals = 2*l_series['A']*l_series['B']*dt
eq_dict[l_seriesname]['acc'] = accidentals
eq_dict[l_seriesname]['AB-acc'] = l_series['AB'] - accidentals
# theoretical N for plotting and fitting data points
def N_theo(a,b,A,C,th,phi):
'''Accepts a,b,th,phi in DEGREES'''
a,b,th,phi = np.pi/180 * np.array([a,b,th,phi])
return C + A*( (np.sin(a)*np.sin(b)*np.cos(th))**2 +
(np.cos(a)*np.cos(b)*np.sin(th))**2 +
1/4*np.sin(2*a)*np.sin(2*b)*np.sin(2*th)*np.cos(phi) )
# range of values for the x-axis
deg = np.linspace(0,180)
# plotting
plt.figure(fname_epr_quant)
plt.cla()
for (l_seriesname,l_vars),color in zip(eq_dict.items(),'rgbk'):
x = l_vars['HWP A']
y = l_vars['AB-acc']
b_plt = l_vars['HWP B'][0]
N_plt = lambda a,A,C,th,phi: N_theo(a,b_plt,A,C,th,phi)
popt, pcov = curve_fit( N_plt, x, y, bounds=([100,1,30,15],[1000,100,60,35]) )
print(b_plt,popt) # print fitting parameters for later analysis
plt.plot( x, y, '.--'+color, label=l_seriesname+' data' )
plt.plot( deg, N_theo(deg, b_plt, *popt), '-'+color, label=l_seriesname+' fit' )
plt.legend()
plt.show(block=False)
(Note: I use l_
for loop variables due to my relatively recent discovery that Python doesn't keep a namespace for loops, so I may be overwriting important variables.)
Data file:
(Note: I have delimited the data with ,<space>
, since \t
doesn't survive the copy/paste into a text editor. :%s/, /\t/g
will trivially recover tabs, if you wish.)
A, B, A', B', A'B, AB, AB', A'B', HWP A, HWP B, E, Comments:, Duration (s), Sub. Acc.?
19346.20, 23218.60, 0.00, 0.00, 0.00, 442.70, 0.00, 0.00, 0.00, 0.00, 1.00, B=0, 10.0, N
19356.10, 23186.20, 0.00, 0.00, 0.00, 426.50, 0.00, 0.00, 8.00, 0.00, 1.00, B=0, 10.0, N
19348.30, 23126.20, 0.00, 0.00, 0.00, 401.90, 0.00, 0.00, 16.00, 0.00, 1.00, B=0, 10.0, N
19075.30, 23145.20, 0.00, 0.00, 0.00, 349.60, 0.00, 0.00, 24.00, 0.00, 1.00, B=0, 10.0, N
18851.40, 23115.40, 0.00, 0.00, 0.00, 299.80, 0.00, 0.00, 32.00, 0.00, 1.00, B=0, 10.0, N
18791.60, 23106.60, 0.00, 0.00, 0.00, 259.00, 0.00, 0.00, 40.00, 0.00, 1.00, B=0, 10.0, N
18561.20, 23129.80, 0.00, 0.00, 0.00, 193.90, 0.00, 0.00, 48.00, 0.00, 1.00, B=0, 10.0, N
18335.30, 23102.50, 0.00, 0.00, 0.00, 136.50, 0.00, 0.00, 56.00, 0.00, 1.00, B=0, 10.0, N
18107.90, 23129.20, 0.00, 0.00, 0.00, 92.70, 0.00, 0.00, 64.00, 0.00, 1.00, B=0, 10.0, N
18042.30, 23135.50, 0.00, 0.00, 0.00, 46.70, 0.00, 0.00, 72.00, 0.00, 1.00, B=0, 10.0, N
17909.20, 23042.20, 0.00, 0.00, 0.00, 22.60, 0.00, 0.00, 80.00, 0.00, 1.00, B=0, 10.0, N
17920.30, 23020.20, 0.00, 0.00, 0.00, 13.40, 0.00, 0.00, 88.00, 0.00, 1.00, B=0, 10.0, N
17945.50, 23048.60, 0.00, 0.00, 0.00, 18.30, 0.00, 0.00, 96.00, 0.00, 1.00, B=0, 10.0, N
17944.50, 23076.20, 0.00, 0.00, 0.00, 44.50, 0.00, 0.00, 104.00, 0.00, 1.00, B=0, 10.0, N
18032.40, 23111.60, 0.00, 0.00, 0.00, 80.50, 0.00, 0.00, 112.00, 0.00, 1.00, B=0, 10.0, N
18254.20, 23131.00, 0.00, 0.00, 0.00, 128.90, 0.00, 0.00, 120.00, 0.00, 1.00, B=0, 10.0, N
18370.90, 23088.10, 0.00, 0.00, 0.00, 174.80, 0.00, 0.00, 128.00, 0.00, 1.00, B=0, 10.0, N
18598.20, 23112.80, 0.00, 0.00, 0.00, 238.80, 0.00, 0.00, 136.00, 0.00, 1.00, B=0, 10.0, N
18780.40, 23076.10, 0.00, 0.00, 0.00, 293.10, 0.00, 0.00, 144.00, 0.00, 1.00, B=0, 10.0, N
18965.40, 23039.90, 0.00, 0.00, 0.00, 338.60, 0.00, 0.00, 152.00, 0.00, 1.00, B=0, 10.0, N
19112.00, 23097.40, 0.00, 0.00, 0.00, 387.60, 0.00, 0.00, 160.00, 0.00, 1.00, B=0, 10.0, N
19185.00, 22957.50, 0.00, 0.00, 0.00, 417.90, 0.00, 0.00, 168.00, 0.00, 1.00, B=0, 10.0, N
19230.40, 23019.70, 0.00, 0.00, 0.00, 423.50, 0.00, 0.00, 176.00, 0.00, 1.00, B=0, 10.0, N
19255.50, 23201.20, 0.00, 0.00, 0.00, 428.50, 0.00, 0.00, 180.00, 0.00, 1.00, B=0, 10.0, N
18910.00, 22827.90, 0.00, 0.00, 0.00, 248.50, 0.00, 0.00, 0.00, 45.00, 1.00, B=45, 10.0, N
18828.00, 22829.80, 0.00, 0.00, 0.00, 321.40, 0.00, 0.00, 12.00, 45.00, 1.00, B=45, 10.0, N
18554.50, 22812.60, 0.00, 0.00, 0.00, 357.30, 0.00, 0.00, 24.00, 45.00, 1.00, B=45, 10.0, N
19159.90, 24043.50, 0.00, 0.00, 0.00, 386.70, 0.00, 0.00, 36.00, 45.00, 1.00, B=45, 10.0, N
18439.10, 23410.90, 0.00, 0.00, 0.00, 382.70, 0.00, 0.00, 48.00, 45.00, 1.00, B=45, 10.0, N
17666.20, 22784.00, 0.00, 0.00, 0.00, 342.20, 0.00, 0.00, 60.00, 45.00, 1.00, B=45, 10.0, N
17479.40, 22887.70, 0.00, 0.00, 0.00, 276.30, 0.00, 0.00, 72.00, 45.00, 1.00, B=45, 10.0, N
17353.90, 22847.90, 0.00, 0.00, 0.00, 213.30, 0.00, 0.00, 84.00, 45.00, 1.00, B=45, 10.0, N
17328.40, 22803.70, 0.00, 0.00, 0.00, 132.30, 0.00, 0.00, 96.00, 45.00, 1.00, B=45, 10.0, N
17613.40, 22917.50, 0.00, 0.00, 0.00, 74.30, 0.00, 0.00, 108.00, 45.00, 1.00, B=45, 10.0, N
17809.60, 22779.80, 0.00, 0.00, 0.00, 41.10, 0.00, 0.00, 120.00, 45.00, 1.00, B=45, 10.0, N
18226.40, 22855.30, 0.00, 0.00, 0.00, 35.10, 0.00, 0.00, 132.00, 45.00, 1.00, B=45, 10.0, N
18498.50, 22838.80, 0.00, 0.00, 0.00, 59.00, 0.00, 0.00, 144.00, 45.00, 1.00, B=45, 10.0, N
18939.00, 22883.70, 0.00, 0.00, 0.00, 107.50, 0.00, 0.00, 156.00, 45.00, 1.00, B=45, 10.0, N
19016.30, 22821.90, 0.00, 0.00, 0.00, 175.40, 0.00, 0.00, 168.00, 45.00, 1.00, B=45, 10.0, N
19077.90, 22873.60, 0.00, 0.00, 0.00, 255.10, 0.00, 0.00, 180.00, 45.00, 1.00, B=45, 10.0, N
19041.60, 21765.30, 0.00, 0.00, 0.00, 20.90, 0.00, 0.00, 0.00, 90.00, 1.00, B=90, 10.0, N
19085.70, 21803.40, 0.00, 0.00, 0.00, 43.20, 0.00, 0.00, 12.00, 90.00, 1.00, B=90, 10.0, N
18822.70, 21839.40, 0.00, 0.00, 0.00, 97.10, 0.00, 0.00, 24.00, 90.00, 1.00, B=90, 10.0, N
18511.70, 21795.40, 0.00, 0.00, 0.00, 163.40, 0.00, 0.00, 36.00, 90.00, 1.00, B=90, 10.0, N
18051.10, 21839.20, 0.00, 0.00, 0.00, 250.30, 0.00, 0.00, 48.00, 90.00, 1.00, B=90, 10.0, N
17727.90, 21838.70, 0.00, 0.00, 0.00, 312.30, 0.00, 0.00, 60.00, 90.00, 1.00, B=90, 10.0, N
17489.30, 21813.20, 0.00, 0.00, 0.00, 360.50, 0.00, 0.00, 72.00, 90.00, 1.00, B=90, 10.0, N
17371.70, 21847.80, 0.00, 0.00, 0.00, 383.50, 0.00, 0.00, 84.00, 90.00, 1.00, B=90, 10.0, N
17249.30, 21918.90, 0.00, 0.00, 0.00, 361.90, 0.00, 0.00, 96.00, 90.00, 1.00, B=90, 10.0, N
17475.90, 21818.20, 0.00, 0.00, 0.00, 314.60, 0.00, 0.00, 108.00, 90.00, 1.00, B=90, 10.0, N
17643.20, 21747.90, 0.00, 0.00, 0.00, 249.60, 0.00, 0.00, 120.00, 90.00, 1.00, B=90, 10.0, N
18132.30, 21869.30, 0.00, 0.00, 0.00, 181.90, 0.00, 0.00, 132.00, 90.00, 1.00, B=90, 10.0, N
18301.00, 21805.40, 0.00, 0.00, 0.00, 111.80, 0.00, 0.00, 144.00, 90.00, 1.00, B=90, 10.0, N
18835.90, 22196.70, 0.00, 0.00, 0.00, 46.00, 0.00, 0.00, 156.00, 90.00, 1.00, B=90, 10.0, N
18534.50, 21750.50, 0.00, 0.00, 0.00, 14.30, 0.00, 0.00, 168.00, 90.00, 1.00, B=90, 10.0, N
18568.90, 21831.90, 0.00, 0.00, 0.00, 14.80, 0.00, 0.00, 180.00, 90.00, 1.00, B=90, 10.0, N
18544.40, 22340.60, 0.00, 0.00, 0.00, 173.10, 0.00, 0.00, 0.00, 135.00, 1.00, B=135, 10.0, N
18485.30, 22349.00, 0.00, 0.00, 0.00, 105.70, 0.00, 0.00, 12.00, 135.00, 1.00, B=135, 10.0, N
18310.80, 22352.90, 0.00, 0.00, 0.00, 62.10, 0.00, 0.00, 24.00, 135.00, 1.00, B=135, 10.0, N
17917.60, 22278.50, 0.00, 0.00, 0.00, 35.50, 0.00, 0.00, 36.00, 135.00, 1.00, B=135, 10.0, N
17576.00, 22315.20, 0.00, 0.00, 0.00, 38.70, 0.00, 0.00, 48.00, 135.00, 1.00, B=135, 10.0, N
17259.60, 22274.40, 0.00, 0.00, 0.00, 62.10, 0.00, 0.00, 60.00, 135.00, 1.00, B=135, 10.0, N
16985.30, 22214.70, 0.00, 0.00, 0.00, 112.50, 0.00, 0.00, 72.00, 135.00, 1.00, B=135, 10.0, N
17070.90, 22526.70, 0.00, 0.00, 0.00, 179.80, 0.00, 0.00, 84.00, 135.00, 1.00, B=135, 10.0, N
17139.60, 22546.70, 0.00, 0.00, 0.00, 238.10, 0.00, 0.00, 96.00, 135.00, 1.00, B=135, 10.0, N
17320.80, 22484.00, 0.00, 0.00, 0.00, 299.10, 0.00, 0.00, 108.00, 135.00, 1.00, B=135, 10.0, N
17398.80, 22275.50, 0.00, 0.00, 0.00, 330.90, 0.00, 0.00, 120.00, 135.00, 1.00, B=135, 10.0, N
17798.80, 22231.60, 0.00, 0.00, 0.00, 342.80, 0.00, 0.00, 132.00, 135.00, 1.00, B=135, 10.0, N
18224.00, 22234.00, 0.00, 0.00, 0.00, 339.50, 0.00, 0.00, 144.00, 135.00, 1.00, B=135, 10.0, N
18391.90, 22246.30, 0.00, 0.00, 0.00, 292.10, 0.00, 0.00, 156.00, 135.00, 1.00, B=135, 10.0, N
18690.00, 22221.70, 0.00, 0.00, 0.00, 233.00, 0.00, 0.00, 168.00, 135.00, 1.00, B=135, 10.0, N
18667.20, 22349.00, 0.00, 0.00, 0.00, 169.00, 0.00, 0.00, 180.00, 135.00, 1.00, B=135, 10.0, N
2 Answers 2
Overall, you've coded this quite nicely. Some comments and suggestions follow.
Use more functions
You have some clever bits of code to load the data into a dictionary of dictionaries, grouped by the Comments: column. This seems like something you might be able to reuse, and could be made into a function def dicts_by_group(data_lines, group):
with the 'Comments:'
value passed to the group
variable, perhaps.
This would also help alleviate some of your namespace/scoping concerns.
Separate out a main function
This is a file you intend to run. In python, the typical way to do that is to have:
if __name__ == '__main__':
# do something
Here __name__
is the name of the current module, which equals '__main__'
when this file is run. Often the # do something
part will be kept terse with a separate call to a main function such as main()
or main(args)
(for command line arguments) or plot_EPR_state_quantification_data()
etc.
Another advantage of this is that you can reuse the functions in this module if you separate out your main function like this. This part isn't run when the module is imported (in that case __name__
isn't '__main__'
) so you could reuse N_theo
or any other function you define.
Beware of library operations which are not performed in place
You have np.append(eq_dict[seriesname][l_varname], value)
in a loop. This isn't a problem with such a small dataset, but be aware that np.append
produces a copy.
In contrast, the built-in append
function for lists is fairly efficient. (I think it has some of the standard over-allocation for growing lists built in.)
You could grow lists and then have a final pass to call np.array
on each. I wouldn't necessarily suggest this, as it would add clutter for a very mild gain in efficiency when you don't need it with such a tiny dataset. But it's something to be aware of.
Consider grouping by HWP B rather than Comments:
You have the value of HWP B in the table in numeric form. It feels wrong to rely on a "Comments" field for this sort of grouping.
Consider using pandas
The library pandas
has a class DataFrame
for tabular data with a function pandas.read_csv
for reading from a comma separated values file (it can also read tab separated with a different parameter).
The first half of your code can be reduced quite a bit with pandas:
import pandas as pd
with open(fname_epr_quant,'r') as f:
eq_dat = pd.read_csv(f)
eq_dat['acc'] = 2 * eq_dat['A'] * eq_dat['B'] * dt
eq_dat['AB-acc'] = eq_dat['AB'] - eq_dat['acc']
eq_dict = {name : group for name, group in eq_dat.groupby('Comments:')}
(or, as mentioned above, groupby('HWP B')
)
For tab-separated, use pd.read_table(f)
or pd.read_csv(f, sep = '\t')
You can also leave out eq_dict
and have the grouping done in the loop of course:
for (l_seriesname,l_vars),color in zip(eq_dat.groupby('Comments:'),'rgbk'):
# etcetera
Note that in the above eq_dat['A']
is a Series, a pandas class for series data with an index (e.g. the time for a time series but here the index would just be row number), not a numpy array. But eq_dat['A'].values
is a numpy array: it's the Series without any index information.
P.S.
One advantage of loop control variables not being locally scoped is that control flow like the following is often convenient and less verbose than alternatives:
for element in some_sequence:
if some_condition(element):
break
else:
# didn't find one
raise SomeError
# do something with element
-
1\$\begingroup\$ Thank you, this is exactly the kind of the feedback I was looking for. And you are right that I have neglected pandas until now, I will look into it! As for the loop scopes, I can see how it makes some neat code. I suppose this will also be helped by confining scopes to functions as well? For now it's still an annoyance, but I suppose it will have long-run benefits. (I would +1, but still <15 :/) \$\endgroup\$nivk– nivk2017年06月09日 06:10:59 +00:00Commented Jun 9, 2017 at 6:10
-
\$\begingroup\$ Right, the smaller the scope, the less you have to worry about name collisions. Glad you found the answer helpful! \$\endgroup\$aes– aes2017年06月10日 01:36:58 +00:00Commented Jun 10, 2017 at 1:36
If I load your file with np.genfromtxt
, the main numpy
csv
file reader
In [642]: data=np.genfromtxt('cr165298.txt',delimiter=',',dtype=None,names=True)
...:
In [643]: data.shape
Out[643]: (72,)
In [644]: data.dtype
Out[644]: dtype([('A', '<f8'), ('B', '<f8'), ('A_1', '<f8'), ('B_1', '<f8'),
('AB', '<f8'), ('AB_1', '<f8'), ('AB_2', '<f8'), ('AB_3', '<f8'), ('HWP_A', '<f8'),
('HWP_B', '<f8'), ('E', '<f8'), ('Comments', 'S6'), ('Duration_s', '<f8'), ('Sub_Acc', 'S2')])
The result is a structured array. 1d with one field per column. The columns are a mix of floats and strings.
Your load creates a 2d array of strings
print(eq_dat.shape, eq_dat.dtype)
# (73, 14) <U12
Your equ_dict
has 4 keys, ['B=135', 'B=45', 'B=0', 'B=90']
, and the subkeys are the same field names as above.
These main keys are my data['Comments']
values
In [649]: data['Comments']
Out[649]:
array([b' B=0', b' B=0', b' B=0', .... b' B=0', b' B=45', b' B=45', b' B=45', b' B=45',
b' B=45',.... b' B=90', b' B=90', b' B=90',
b' B=90', .... b' B=135', b' B=135', b' B=135',
b' B=135', b' B=135', b' B=135', b' B=135', b' B=135', b' B=135',
b' B=135', b' B=135'],
dtype='|S6')
If I add autostrip=True
to the genfromtxt
call I get values like b'B=135'. The b
denotes a bytestring in Py3. You won't see it in Py2.
So one array in your dictionary is:
print(eq_dict['B=135']['A'])
# [ 18544.4 18485.3 18310.8 17917.6 17576. 17259.6 16985.3 17070.9
17139.6 17320.8 17398.8 17798.8 18224. 18391.9 18690. 18667.2]
I can pull the same values from data
:
# Find the rows where `Comments` equal one key:
In [657]: idx = data['Comments']==b' B=135'
In [658]: idx = np.where(idx)
In [659]: idx
Out[659]: (array([56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71], dtype=int32),)
# Select that field and that set of rows:
In [660]: data['A'][idx]
Out[660]:
array([ 18544.4, 18485.3, 18310.8, 17917.6, 17576. , 17259.6,
16985.3, 17070.9, 17139.6, 17320.8, 17398.8, 17798.8,
18224. , 18391.9, 18690. , 18667.2])
It might be easier to work with data
if I loaded a subset of the columns. For example you appear to use 'A', 'B' and a few others.
genfromtxthas a
usecols` parameter to do that. In fact if load floats separately from the string (comments) column, I could get a 2d array of floats.
So loading this data as structured array is easy. Grouping records by 'Comments' values takes a bit more work. I suspect that loading this with Pandas would make this grouping even easier, but I'm not a regular Pandas user.
On SO I usually discourage the iterative use of np.append
(or for that matter, any use of `np.append). So I'd prefer to see:
if seriesname not in eq_dict.keys():
eq_dict[seriesname] = { varname:[] for varname in varnames }
for l_val,l_varname in zip(l_line,varnames):
try:
value = float(l_val)
except ValueError: # better than the open ended except:
value = l_val
eq_dict[seriesname][l_varname].append(value)
A disadvantage of this is that it requires converting the lists (or a subset of them) to arrays later, or doing all calculations with list comprehensions.
for seriesname ....
series = eq_dict[seriesname]
for l_varname ....
series[l_varname] = np.array(series[l_varname])
TypeError: leastsq() got an unexpected keyword argument 'bounds'
fromcurve_fit()
\$\endgroup\$curve_fit
method takes abounds
parameter since scipy 0.17 according to the docs. \$\endgroup\$