Read, organize, and plot data from a formatted file

Question 1

I have a data output file from a LabVIEW program (that I cannot modify), and I need to plot the data and fit to a theoretical curve. I accomplished this with Python (with which I am entirely self-taught).

My code runs and produces the desired output just fine, but I feel that it is much too messy, unreadable, and hacked together, not to mention that it was time-consuming to produce (especially for a task as common as reading and plotting data). What code smells are present, and is there a faster/preferred way to do this?

What needs to be plotted?

From the data file (below), there are 4 distinct values of HWP B: 0, 45, 90, and 135. With this angle held constant, HWP A (which varies from 0 to 180), must be plotted on the x-axis against a function of the columns A, B, and AB (namely, AB - 2*A*B*dt, where dt is a constant). The following plot is produced, as well as 4 lines which are used later by me.

(For those wondering, this data follows from an experiment outlined by Dehlinger and Mitchell in a paper on entanglement available on the arXiv.)

Output plot showing 4 data series and their fits.

Code:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
''' Read, analyze and plot data from EPR state quantification data file '''
# time interval for data collection
dt = 5e-9
# read in data from file
fname_epr_quant = "qie-epr-quant.dat"
with open(fname_epr_quant,'r') as f:
 eq_dat = np.array([ line.split(', ') for line in list(filter(None,f.read().split('\n'))) ])
# create dictionary for the data so that it can be referenced by name
varnames = eq_dat[0]
eq_n2i = { name:index for index,name in enumerate(varnames)} #name-to-index
eq_dict = {}
for l_line in eq_dat[1:]:
 seriesname = l_line[ eq_n2i['Comments:'] ]
 if seriesname not in eq_dict.keys():
 eq_dict[seriesname] = { varname:np.array([]) for varname in varnames }
 for l_val,l_varname in zip(l_line,varnames):
 try:
 value = float(l_val)
 except:
 value = l_val
 eq_dict[seriesname][l_varname] = np.append(eq_dict[seriesname][l_varname], value)
# compute 2 new quantities from the data ('acc' and 'AB-acc')
# given that accidentals = 2*A*B*dt
for l_seriesname,l_series in eq_dict.items():
 accidentals = 2*l_series['A']*l_series['B']*dt
 eq_dict[l_seriesname]['acc'] = accidentals
 eq_dict[l_seriesname]['AB-acc'] = l_series['AB'] - accidentals
# theoretical N for plotting and fitting data points
def N_theo(a,b,A,C,th,phi):
 '''Accepts a,b,th,phi in DEGREES'''
 a,b,th,phi = np.pi/180 * np.array([a,b,th,phi])
 return C + A*( (np.sin(a)*np.sin(b)*np.cos(th))**2 + 
 (np.cos(a)*np.cos(b)*np.sin(th))**2 + 
 1/4*np.sin(2*a)*np.sin(2*b)*np.sin(2*th)*np.cos(phi) )
# range of values for the x-axis
deg = np.linspace(0,180)
# plotting
plt.figure(fname_epr_quant)
plt.cla()
for (l_seriesname,l_vars),color in zip(eq_dict.items(),'rgbk'):
 x = l_vars['HWP A']
 y = l_vars['AB-acc']
 b_plt = l_vars['HWP B'][0]
 N_plt = lambda a,A,C,th,phi: N_theo(a,b_plt,A,C,th,phi)
 popt, pcov = curve_fit( N_plt, x, y, bounds=([100,1,30,15],[1000,100,60,35]) )
 print(b_plt,popt) # print fitting parameters for later analysis
 plt.plot( x, y, '.--'+color, label=l_seriesname+' data' )
 plt.plot( deg, N_theo(deg, b_plt, *popt), '-'+color, label=l_seriesname+' fit' )
plt.legend()
plt.show(block=False)

(Note: I use l_ for loop variables due to my relatively recent discovery that Python doesn't keep a namespace for loops, so I may be overwriting important variables.)

Data file:

(Note: I have delimited the data with ,<space>, since \t doesn't survive the copy/paste into a text editor. :%s/, /\t/g will trivially recover tabs, if you wish.)

A, B, A', B', A'B, AB, AB', A'B', HWP A, HWP B, E, Comments:, Duration (s), Sub. Acc.?
19346.20, 23218.60, 0.00, 0.00, 0.00, 442.70, 0.00, 0.00, 0.00, 0.00, 1.00, B=0, 10.0, N
19356.10, 23186.20, 0.00, 0.00, 0.00, 426.50, 0.00, 0.00, 8.00, 0.00, 1.00, B=0, 10.0, N
19348.30, 23126.20, 0.00, 0.00, 0.00, 401.90, 0.00, 0.00, 16.00, 0.00, 1.00, B=0, 10.0, N
19075.30, 23145.20, 0.00, 0.00, 0.00, 349.60, 0.00, 0.00, 24.00, 0.00, 1.00, B=0, 10.0, N
18851.40, 23115.40, 0.00, 0.00, 0.00, 299.80, 0.00, 0.00, 32.00, 0.00, 1.00, B=0, 10.0, N
18791.60, 23106.60, 0.00, 0.00, 0.00, 259.00, 0.00, 0.00, 40.00, 0.00, 1.00, B=0, 10.0, N
18561.20, 23129.80, 0.00, 0.00, 0.00, 193.90, 0.00, 0.00, 48.00, 0.00, 1.00, B=0, 10.0, N
18335.30, 23102.50, 0.00, 0.00, 0.00, 136.50, 0.00, 0.00, 56.00, 0.00, 1.00, B=0, 10.0, N
18107.90, 23129.20, 0.00, 0.00, 0.00, 92.70, 0.00, 0.00, 64.00, 0.00, 1.00, B=0, 10.0, N
18042.30, 23135.50, 0.00, 0.00, 0.00, 46.70, 0.00, 0.00, 72.00, 0.00, 1.00, B=0, 10.0, N
17909.20, 23042.20, 0.00, 0.00, 0.00, 22.60, 0.00, 0.00, 80.00, 0.00, 1.00, B=0, 10.0, N
17920.30, 23020.20, 0.00, 0.00, 0.00, 13.40, 0.00, 0.00, 88.00, 0.00, 1.00, B=0, 10.0, N
17945.50, 23048.60, 0.00, 0.00, 0.00, 18.30, 0.00, 0.00, 96.00, 0.00, 1.00, B=0, 10.0, N
17944.50, 23076.20, 0.00, 0.00, 0.00, 44.50, 0.00, 0.00, 104.00, 0.00, 1.00, B=0, 10.0, N
18032.40, 23111.60, 0.00, 0.00, 0.00, 80.50, 0.00, 0.00, 112.00, 0.00, 1.00, B=0, 10.0, N
18254.20, 23131.00, 0.00, 0.00, 0.00, 128.90, 0.00, 0.00, 120.00, 0.00, 1.00, B=0, 10.0, N
18370.90, 23088.10, 0.00, 0.00, 0.00, 174.80, 0.00, 0.00, 128.00, 0.00, 1.00, B=0, 10.0, N
18598.20, 23112.80, 0.00, 0.00, 0.00, 238.80, 0.00, 0.00, 136.00, 0.00, 1.00, B=0, 10.0, N
18780.40, 23076.10, 0.00, 0.00, 0.00, 293.10, 0.00, 0.00, 144.00, 0.00, 1.00, B=0, 10.0, N
18965.40, 23039.90, 0.00, 0.00, 0.00, 338.60, 0.00, 0.00, 152.00, 0.00, 1.00, B=0, 10.0, N
19112.00, 23097.40, 0.00, 0.00, 0.00, 387.60, 0.00, 0.00, 160.00, 0.00, 1.00, B=0, 10.0, N
19185.00, 22957.50, 0.00, 0.00, 0.00, 417.90, 0.00, 0.00, 168.00, 0.00, 1.00, B=0, 10.0, N
19230.40, 23019.70, 0.00, 0.00, 0.00, 423.50, 0.00, 0.00, 176.00, 0.00, 1.00, B=0, 10.0, N
19255.50, 23201.20, 0.00, 0.00, 0.00, 428.50, 0.00, 0.00, 180.00, 0.00, 1.00, B=0, 10.0, N
18910.00, 22827.90, 0.00, 0.00, 0.00, 248.50, 0.00, 0.00, 0.00, 45.00, 1.00, B=45, 10.0, N
18828.00, 22829.80, 0.00, 0.00, 0.00, 321.40, 0.00, 0.00, 12.00, 45.00, 1.00, B=45, 10.0, N
18554.50, 22812.60, 0.00, 0.00, 0.00, 357.30, 0.00, 0.00, 24.00, 45.00, 1.00, B=45, 10.0, N
19159.90, 24043.50, 0.00, 0.00, 0.00, 386.70, 0.00, 0.00, 36.00, 45.00, 1.00, B=45, 10.0, N
18439.10, 23410.90, 0.00, 0.00, 0.00, 382.70, 0.00, 0.00, 48.00, 45.00, 1.00, B=45, 10.0, N
17666.20, 22784.00, 0.00, 0.00, 0.00, 342.20, 0.00, 0.00, 60.00, 45.00, 1.00, B=45, 10.0, N
17479.40, 22887.70, 0.00, 0.00, 0.00, 276.30, 0.00, 0.00, 72.00, 45.00, 1.00, B=45, 10.0, N
17353.90, 22847.90, 0.00, 0.00, 0.00, 213.30, 0.00, 0.00, 84.00, 45.00, 1.00, B=45, 10.0, N
17328.40, 22803.70, 0.00, 0.00, 0.00, 132.30, 0.00, 0.00, 96.00, 45.00, 1.00, B=45, 10.0, N
17613.40, 22917.50, 0.00, 0.00, 0.00, 74.30, 0.00, 0.00, 108.00, 45.00, 1.00, B=45, 10.0, N
17809.60, 22779.80, 0.00, 0.00, 0.00, 41.10, 0.00, 0.00, 120.00, 45.00, 1.00, B=45, 10.0, N
18226.40, 22855.30, 0.00, 0.00, 0.00, 35.10, 0.00, 0.00, 132.00, 45.00, 1.00, B=45, 10.0, N
18498.50, 22838.80, 0.00, 0.00, 0.00, 59.00, 0.00, 0.00, 144.00, 45.00, 1.00, B=45, 10.0, N
18939.00, 22883.70, 0.00, 0.00, 0.00, 107.50, 0.00, 0.00, 156.00, 45.00, 1.00, B=45, 10.0, N
19016.30, 22821.90, 0.00, 0.00, 0.00, 175.40, 0.00, 0.00, 168.00, 45.00, 1.00, B=45, 10.0, N
19077.90, 22873.60, 0.00, 0.00, 0.00, 255.10, 0.00, 0.00, 180.00, 45.00, 1.00, B=45, 10.0, N
19041.60, 21765.30, 0.00, 0.00, 0.00, 20.90, 0.00, 0.00, 0.00, 90.00, 1.00, B=90, 10.0, N
19085.70, 21803.40, 0.00, 0.00, 0.00, 43.20, 0.00, 0.00, 12.00, 90.00, 1.00, B=90, 10.0, N
18822.70, 21839.40, 0.00, 0.00, 0.00, 97.10, 0.00, 0.00, 24.00, 90.00, 1.00, B=90, 10.0, N
18511.70, 21795.40, 0.00, 0.00, 0.00, 163.40, 0.00, 0.00, 36.00, 90.00, 1.00, B=90, 10.0, N
18051.10, 21839.20, 0.00, 0.00, 0.00, 250.30, 0.00, 0.00, 48.00, 90.00, 1.00, B=90, 10.0, N
17727.90, 21838.70, 0.00, 0.00, 0.00, 312.30, 0.00, 0.00, 60.00, 90.00, 1.00, B=90, 10.0, N
17489.30, 21813.20, 0.00, 0.00, 0.00, 360.50, 0.00, 0.00, 72.00, 90.00, 1.00, B=90, 10.0, N
17371.70, 21847.80, 0.00, 0.00, 0.00, 383.50, 0.00, 0.00, 84.00, 90.00, 1.00, B=90, 10.0, N
17249.30, 21918.90, 0.00, 0.00, 0.00, 361.90, 0.00, 0.00, 96.00, 90.00, 1.00, B=90, 10.0, N
17475.90, 21818.20, 0.00, 0.00, 0.00, 314.60, 0.00, 0.00, 108.00, 90.00, 1.00, B=90, 10.0, N
17643.20, 21747.90, 0.00, 0.00, 0.00, 249.60, 0.00, 0.00, 120.00, 90.00, 1.00, B=90, 10.0, N
18132.30, 21869.30, 0.00, 0.00, 0.00, 181.90, 0.00, 0.00, 132.00, 90.00, 1.00, B=90, 10.0, N
18301.00, 21805.40, 0.00, 0.00, 0.00, 111.80, 0.00, 0.00, 144.00, 90.00, 1.00, B=90, 10.0, N
18835.90, 22196.70, 0.00, 0.00, 0.00, 46.00, 0.00, 0.00, 156.00, 90.00, 1.00, B=90, 10.0, N
18534.50, 21750.50, 0.00, 0.00, 0.00, 14.30, 0.00, 0.00, 168.00, 90.00, 1.00, B=90, 10.0, N
18568.90, 21831.90, 0.00, 0.00, 0.00, 14.80, 0.00, 0.00, 180.00, 90.00, 1.00, B=90, 10.0, N
18544.40, 22340.60, 0.00, 0.00, 0.00, 173.10, 0.00, 0.00, 0.00, 135.00, 1.00, B=135, 10.0, N
18485.30, 22349.00, 0.00, 0.00, 0.00, 105.70, 0.00, 0.00, 12.00, 135.00, 1.00, B=135, 10.0, N
18310.80, 22352.90, 0.00, 0.00, 0.00, 62.10, 0.00, 0.00, 24.00, 135.00, 1.00, B=135, 10.0, N
17917.60, 22278.50, 0.00, 0.00, 0.00, 35.50, 0.00, 0.00, 36.00, 135.00, 1.00, B=135, 10.0, N
17576.00, 22315.20, 0.00, 0.00, 0.00, 38.70, 0.00, 0.00, 48.00, 135.00, 1.00, B=135, 10.0, N
17259.60, 22274.40, 0.00, 0.00, 0.00, 62.10, 0.00, 0.00, 60.00, 135.00, 1.00, B=135, 10.0, N
16985.30, 22214.70, 0.00, 0.00, 0.00, 112.50, 0.00, 0.00, 72.00, 135.00, 1.00, B=135, 10.0, N
17070.90, 22526.70, 0.00, 0.00, 0.00, 179.80, 0.00, 0.00, 84.00, 135.00, 1.00, B=135, 10.0, N
17139.60, 22546.70, 0.00, 0.00, 0.00, 238.10, 0.00, 0.00, 96.00, 135.00, 1.00, B=135, 10.0, N
17320.80, 22484.00, 0.00, 0.00, 0.00, 299.10, 0.00, 0.00, 108.00, 135.00, 1.00, B=135, 10.0, N
17398.80, 22275.50, 0.00, 0.00, 0.00, 330.90, 0.00, 0.00, 120.00, 135.00, 1.00, B=135, 10.0, N
17798.80, 22231.60, 0.00, 0.00, 0.00, 342.80, 0.00, 0.00, 132.00, 135.00, 1.00, B=135, 10.0, N
18224.00, 22234.00, 0.00, 0.00, 0.00, 339.50, 0.00, 0.00, 144.00, 135.00, 1.00, B=135, 10.0, N
18391.90, 22246.30, 0.00, 0.00, 0.00, 292.10, 0.00, 0.00, 156.00, 135.00, 1.00, B=135, 10.0, N
18690.00, 22221.70, 0.00, 0.00, 0.00, 233.00, 0.00, 0.00, 168.00, 135.00, 1.00, B=135, 10.0, N
18667.20, 22349.00, 0.00, 0.00, 0.00, 169.00, 0.00, 0.00, 180.00, 135.00, 1.00, B=135, 10.0, N

Question 2

Tried to run but it died with the message: TypeError: leastsq() got an unexpected keyword argument 'bounds' from curve_fit()

Question 3

@StephenRauch It ran for me. The curve_fit method takes a bounds parameter since scipy 0.17 according to the docs.

Question 4

@aes, thanks, that is weird because I am using 0.19.0

Question 5

Overall, you've coded this quite nicely. Some comments and suggestions follow.

Use more functions

You have some clever bits of code to load the data into a dictionary of dictionaries, grouped by the Comments: column. This seems like something you might be able to reuse, and could be made into a function def dicts_by_group(data_lines, group): with the 'Comments:' value passed to the group variable, perhaps.

This would also help alleviate some of your namespace/scoping concerns.

Separate out a main function

This is a file you intend to run. In python, the typical way to do that is to have:

if __name__ == '__main__':
 # do something

Here __name__ is the name of the current module, which equals '__main__' when this file is run. Often the # do something part will be kept terse with a separate call to a main function such as main() or main(args) (for command line arguments) or plot_EPR_state_quantification_data() etc.

Another advantage of this is that you can reuse the functions in this module if you separate out your main function like this. This part isn't run when the module is imported (in that case __name__ isn't '__main__') so you could reuse N_theo or any other function you define.

Beware of library operations which are not performed in place

You have np.append(eq_dict[seriesname][l_varname], value) in a loop. This isn't a problem with such a small dataset, but be aware that np.append produces a copy.

In contrast, the built-in append function for lists is fairly efficient. (I think it has some of the standard over-allocation for growing lists built in.)

You could grow lists and then have a final pass to call np.array on each. I wouldn't necessarily suggest this, as it would add clutter for a very mild gain in efficiency when you don't need it with such a tiny dataset. But it's something to be aware of.

Consider grouping by HWP B rather than Comments:

You have the value of HWP B in the table in numeric form. It feels wrong to rely on a "Comments" field for this sort of grouping.

Consider using pandas

The library pandas has a class DataFrame for tabular data with a function pandas.read_csv for reading from a comma separated values file (it can also read tab separated with a different parameter).

The first half of your code can be reduced quite a bit with pandas:

import pandas as pd
with open(fname_epr_quant,'r') as f:
 eq_dat = pd.read_csv(f)
eq_dat['acc'] = 2 * eq_dat['A'] * eq_dat['B'] * dt
eq_dat['AB-acc'] = eq_dat['AB'] - eq_dat['acc']
eq_dict = {name : group for name, group in eq_dat.groupby('Comments:')}

(or, as mentioned above, groupby('HWP B'))

For tab-separated, use pd.read_table(f) or pd.read_csv(f, sep = '\t')

You can also leave out eq_dict and have the grouping done in the loop of course:

for (l_seriesname,l_vars),color in zip(eq_dat.groupby('Comments:'),'rgbk'):
 # etcetera

Note that in the above eq_dat['A'] is a Series, a pandas class for series data with an index (e.g. the time for a time series but here the index would just be row number), not a numpy array. But eq_dat['A'].values is a numpy array: it's the Series without any index information.

P.S.

One advantage of loop control variables not being locally scoped is that control flow like the following is often convenient and less verbose than alternatives:

for element in some_sequence:
 if some_condition(element):
 break
else:
 # didn't find one
 raise SomeError
# do something with element

Question 6

Thank you, this is exactly the kind of the feedback I was looking for. And you are right that I have neglected pandas until now, I will look into it! As for the loop scopes, I can see how it makes some neat code. I suppose this will also be helped by confining scopes to functions as well? For now it's still an annoyance, but I suppose it will have long-run benefits. (I would +1, but still <15 :/)

Question 7

Right, the smaller the scope, the less you have to worry about name collisions. Glad you found the answer helpful!

Question 8

If I load your file with np.genfromtxt, the main numpy csv file reader

In [642]: data=np.genfromtxt('cr165298.txt',delimiter=',',dtype=None,names=True)
 ...: 
In [643]: data.shape
Out[643]: (72,)
In [644]: data.dtype
Out[644]: dtype([('A', '<f8'), ('B', '<f8'), ('A_1', '<f8'), ('B_1', '<f8'), 
 ('AB', '<f8'), ('AB_1', '<f8'), ('AB_2', '<f8'), ('AB_3', '<f8'), ('HWP_A', '<f8'), 
 ('HWP_B', '<f8'), ('E', '<f8'), ('Comments', 'S6'), ('Duration_s', '<f8'), ('Sub_Acc', 'S2')])

The result is a structured array. 1d with one field per column. The columns are a mix of floats and strings.

Your load creates a 2d array of strings

 print(eq_dat.shape, eq_dat.dtype)
 # (73, 14) <U12

Your equ_dict has 4 keys, ['B=135', 'B=45', 'B=0', 'B=90'], and the subkeys are the same field names as above.

These main keys are my data['Comments'] values

In [649]: data['Comments']
Out[649]: 
array([b' B=0', b' B=0', b' B=0', .... b' B=0', b' B=45', b' B=45', b' B=45', b' B=45',
 b' B=45',.... b' B=90', b' B=90', b' B=90',
 b' B=90', .... b' B=135', b' B=135', b' B=135',
 b' B=135', b' B=135', b' B=135', b' B=135', b' B=135', b' B=135',
 b' B=135', b' B=135'], 
 dtype='|S6')

If I add autostrip=True to the genfromtxt call I get values like b'B=135'. The b denotes a bytestring in Py3. You won't see it in Py2.

So one array in your dictionary is:

print(eq_dict['B=135']['A'])
# [ 18544.4 18485.3 18310.8 17917.6 17576. 17259.6 16985.3 17070.9
 17139.6 17320.8 17398.8 17798.8 18224. 18391.9 18690. 18667.2]

I can pull the same values from data:

# Find the rows where `Comments` equal one key:
In [657]: idx = data['Comments']==b' B=135'
In [658]: idx = np.where(idx)
In [659]: idx
Out[659]: (array([56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71], dtype=int32),)
# Select that field and that set of rows:
In [660]: data['A'][idx]
Out[660]: 
array([ 18544.4, 18485.3, 18310.8, 17917.6, 17576. , 17259.6,
 16985.3, 17070.9, 17139.6, 17320.8, 17398.8, 17798.8,
 18224. , 18391.9, 18690. , 18667.2])

It might be easier to work with data if I loaded a subset of the columns. For example you appear to use 'A', 'B' and a few others.genfromtxthas ausecols` parameter to do that. In fact if load floats separately from the string (comments) column, I could get a 2d array of floats.

So loading this data as structured array is easy. Grouping records by 'Comments' values takes a bit more work. I suspect that loading this with Pandas would make this grouping even easier, but I'm not a regular Pandas user.

On SO I usually discourage the iterative use of np.append (or for that matter, any use of `np.append). So I'd prefer to see:

if seriesname not in eq_dict.keys():
 eq_dict[seriesname] = { varname:[] for varname in varnames }
for l_val,l_varname in zip(l_line,varnames):
 try:
 value = float(l_val)
 except ValueError: # better than the open ended except:
 value = l_val
 eq_dict[seriesname][l_varname].append(value)

A disadvantage of this is that it requires converting the lists (or a subset of them) to arrays later, or doing all calculations with list comprehensions.

for seriesname ....
 series = eq_dict[seriesname]
 for l_varname ....
 series[l_varname] = np.array(series[l_varname])

aes aes 2431 silver badge6 bronze badges · Accepted Answer · 2017-06-09 04:30:26Z

Overall, you've coded this quite nicely. Some comments and suggestions follow.

Use more functions

You have some clever bits of code to load the data into a dictionary of dictionaries, grouped by the Comments: column. This seems like something you might be able to reuse, and could be made into a function def dicts_by_group(data_lines, group): with the 'Comments:' value passed to the group variable, perhaps.

This would also help alleviate some of your namespace/scoping concerns.

Separate out a main function

This is a file you intend to run. In python, the typical way to do that is to have:

if __name__ == '__main__':
 # do something

Here __name__ is the name of the current module, which equals '__main__' when this file is run. Often the # do something part will be kept terse with a separate call to a main function such as main() or main(args) (for command line arguments) or plot_EPR_state_quantification_data() etc.

Another advantage of this is that you can reuse the functions in this module if you separate out your main function like this. This part isn't run when the module is imported (in that case __name__ isn't '__main__') so you could reuse N_theo or any other function you define.

Beware of library operations which are not performed in place

You have np.append(eq_dict[seriesname][l_varname], value) in a loop. This isn't a problem with such a small dataset, but be aware that np.append produces a copy.

In contrast, the built-in append function for lists is fairly efficient. (I think it has some of the standard over-allocation for growing lists built in.)

You could grow lists and then have a final pass to call np.array on each. I wouldn't necessarily suggest this, as it would add clutter for a very mild gain in efficiency when you don't need it with such a tiny dataset. But it's something to be aware of.

Consider grouping by HWP B rather than Comments:

You have the value of HWP B in the table in numeric form. It feels wrong to rely on a "Comments" field for this sort of grouping.

Consider using pandas

The library pandas has a class DataFrame for tabular data with a function pandas.read_csv for reading from a comma separated values file (it can also read tab separated with a different parameter).

The first half of your code can be reduced quite a bit with pandas:

import pandas as pd
with open(fname_epr_quant,'r') as f:
 eq_dat = pd.read_csv(f)
eq_dat['acc'] = 2 * eq_dat['A'] * eq_dat['B'] * dt
eq_dat['AB-acc'] = eq_dat['AB'] - eq_dat['acc']
eq_dict = {name : group for name, group in eq_dat.groupby('Comments:')}

(or, as mentioned above, groupby('HWP B'))

For tab-separated, use pd.read_table(f) or pd.read_csv(f, sep = '\t')

You can also leave out eq_dict and have the grouping done in the loop of course:

for (l_seriesname,l_vars),color in zip(eq_dat.groupby('Comments:'),'rgbk'):
 # etcetera

Note that in the above eq_dat['A'] is a Series, a pandas class for series data with an index (e.g. the time for a time series but here the index would just be row number), not a numpy array. But eq_dat['A'].values is a numpy array: it's the Series without any index information.

P.S.

One advantage of loop control variables not being locally scoped is that control flow like the following is often convenient and less verbose than alternatives:

for element in some_sequence:
 if some_condition(element):
 break
else:
 # didn't find one
 raise SomeError
# do something with element

Thank you, this is exactly the kind of the feedback I was looking for. And you are right that I have neglected pandas until now, I will look into it! As for the loop scopes, I can see how it makes some neat code. I suppose this will also be helped by confining scopes to functions as well? For now it's still an annoyance, but I suppose it will have long-run benefits. (I would +1, but still <15 :/)
Right, the smaller the scope, the less you have to worry about name collisions. Glad you found the answer helpful!

Stack Exchange Network

Read, organize, and plot data from a formatted file

What needs to be plotted?

Code:

Data file:

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Read, organize, and plot data from a formatted file

What needs to be plotted?

Code:

Data file:

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions