I have collected large dataset of trajectories in 3D space. Below is the information about dataset-
- Each trajectory (shape: n x 4) is saved into CSV file with the following header:
time, p_x, p_y, p_z
- The filenames are defined in following way: subject_
i
_trail_i
.csv, wherei
starts from 1 to 10.
Below are first 10 lines from subject_1_trail_1.csv
file-
time p_x p_y p_z
0 0.4333 0.1107 0.1259
0.0103 0.4336 0.1106 0.126
0.02 0.4334 0.1108 0.1259
0.03 0.4333 0.1106 0.1259
0.04 0.4334 0.1107 0.1259
0.0501 0.4328 0.1103 0.126
0.06 0.4331 0.1107 0.1255
0.0703 0.4331 0.1103 0.126
0.08 0.4324 0.1102 0.126
For each subject, I want to plot the trajectory showing median and variance as shown below-
fill_between(x, perc_25, perc_75)
I am using NumPy but the code is relatively long. Below is the code snippet-
from pylab import *
import sys
# make a 3d numpy array (chop off the extra rows)
def load(location, subject, suffix):
all_data = []
min_rows = sys.maxint
for i in range(1, 11):
file_name = location + subject + '_trail_' + str(i) + '.csv'
data = np.loadtxt(file_name, delimiter=',', skiprows=1)
if data.shape[0] < min_rows:
min_rows = data.shape[0]
all_data.append(data)
all_crop_data = []
for data in all_data:
all_crop_data.append(data[:min_rows,:])
return np.array(all_crop_data)
def proc(data):
t = data[:, :, 0] # first column is time
x = data[:, :, 1] # second column is x
y = data[:, :, 2] # third column is y
z = data[:, :, 3] # fourth column is z
t_median = np.zeros(data.shape[1])
t_perc_25 = np.zeros(data.shape[1])
t_perc_75 = np.zeros(data.shape[1])
x_median = np.zeros(data.shape[1])
x_perc_25 = np.zeros(data.shape[1])
x_perc_75 = np.zeros(data.shape[1])
y_median = np.zeros(data.shape[1])
y_perc_25 = np.zeros(data.shape[1])
y_perc_75 = np.zeros(data.shape[1])
z_median = np.zeros(data.shape[1])
z_perc_25 = np.zeros(data.shape[1])
z_perc_75 = np.zeros(data.shape[1])
for i in range(data.shape[1]): # for each timestamp
t_median[i] = np.median(t[:, i])
t_perc_25[i] = np.percentile(t[:, i], 25)
t_perc_75[i] = np.percentile(t[:, i], 75)
x_median[i] = np.median(x[:, i])
x_perc_25[i] = np.percentile(x[:, i], 25)
x_perc_75[i] = np.percentile(x[:, i], 75)
y_median[i] = np.median(y[:, i])
y_perc_25[i] = np.percentile(y[:, i], 25)
y_perc_75[i] = np.percentile(y[:, i], 75)
z_median[i] = np.median(z[:, i])
z_perc_25[i] = np.percentile(z[:, i], 25)
z_perc_75[i] = np.percentile(z[:, i], 75)
all_median = np.vstack((t_median, x_median, y_median, z_median)).T
all_perc_25 = np.vstack((t_perc_25, x_perc_25, y_perc_25, z_perc_25)).T
all_perc_75 = np.vstack((t_perc_75, x_perc_75, y_perc_75, z_perc_75)).T
return all_median, all_perc_25, all_perc_75
s1 = load('/Desktop/', 'subject_1') # subject 1
s2 = load('/Desktop/', 'subject_2') # subject 2
x = np.arange(0, s1.shape[1])
s1_med, s1_perc_25, s1_perc_75 = proc(s1)
s2_med, s2_perc_25, s2_perc_75 = proc(s2)
# lets plot only x (second column)
index = 1
s1_med = s1_med[:, index]
s1_perc_25 = s1_perc_25[:, index]
s1_perc_75 = s1_perc_75[:, index]
s2_med = s2_med[:, index]
s2_perc_25 = s2_perc_25[:, index]
s2_perc_75 = s2_perc_75[:, index]
fill_between(x, s1_perc_25, s1_perc_75, alpha=0.25, linewidth=0, color='#B22400')
fill_between(x, s2_perc_25, s2_perc_75, alpha=0.25, linewidth=0, color='#006BB2')
plot(x, s1_med, linewidth=2, color='#B22400')
plot(x, s2_med, linewidth=2, color='#006BB2')
I am looking for a better, i.e., pythonic way to achieve the same. I am not sure if Pandas can be useful here. Looking for your suggestions, please.
-
\$\begingroup\$ You should add your imports to make this self-contained code. I would also help if you could post how to generate some example data. \$\endgroup\$Graipher– Graipher2018年04月23日 16:19:56 +00:00Commented Apr 23, 2018 at 16:19
1 Answer 1
Your function proc
can be greatly reduced by using the fact that numpy
functions are vectorized and most of them take the axis to act upon as an argument. This is certainly true for numpy.median
and numpy.percentile
.
import numpy as np
def proc(data):
return (np.median(data, axis=0),
np.percentile(data, 25, axis=0),
np.percentile(data, 75, axis=0))
You can also use a for
loop for plotting, to avoid duplicating everything:
if __name__ == "__main__":
colors = '#B22400', '#006BB2'
file_names = 'subject_1', 'subject_2'
for file_name, color in zip(file_names, colors):
subject = load('/Desktop/', file_name)
med, perc_25, perc_75 = proc(subject)
# lets plot only x (second column)
med = med[:, 1]
perc_25 = perc_25[:, 1]
perc_75 = perc_75[:, 1]
fill_between(x, perc_25, perc_75, alpha=0.25, linewidth=0, color=color)
plot(x, med, linewidth=2, color=color)
Note that x
is currently undefined (as in your code).
I also added a if __name__ == "__main__":
guard to allow importing from this script without executing the plotting.
In addition I changed your indentation to conform to Python's official style-guide, PEP8:
This
med = med[:, 1]
perc_25 = perc_25[:, 1]
perc_75 = perc_75[:, 1]
should just be
med = med[:, 1]
perc_25 = perc_25[:, 1]
perc_75 = perc_75[:, 1]
as noted under Pet Peeves (of Guido van Rossum, the creator and BDFL of Python).
-
\$\begingroup\$ I like PEP8, but sometimes I align a series of assignments like Ravi did. I feel excused by the seventh line of The Zen (PEP20): Readability counts. \$\endgroup\$Jan Kuiken– Jan Kuiken2018年04月23日 17:16:58 +00:00Commented Apr 23, 2018 at 17:16
-
2\$\begingroup\$ @JanKuiken I agree that sometimes this makes it more readable. I mainly commented on it 1. Out of habit but 2. Because when collaborating on code, one should have agreed upon code styles (ideally enforced/applied by automatic tools). And for Python the agreed upon style is per default PEP8. If you wish reviewers not to comment on one particular violation of some common style guide, you can also mention it as a note in your question. Reviewers are still free to comment on it (if they think it is a particularly bad idea), but they usually won't in that case. \$\endgroup\$Graipher– Graipher2018年04月23日 17:24:19 +00:00Commented Apr 23, 2018 at 17:24
-
\$\begingroup\$ Thanks, Graipher. The
axis
argument is really helpful! \$\endgroup\$ravi– ravi2018年04月24日 02:25:41 +00:00Commented Apr 24, 2018 at 2:25