I have a list of 2D arrays which comes from a time evolution PDE (in \$(x, y, t)\$) that I solved numerically. There are \$k\$ arrays, which all have the same dimensions, and the arrays correspond to a solution field of the PDE at times \$t = 0, 0.1, \dots, 0.1k\$. I wrote a program that takes in this list and computes the autocorrelation between the arrays for varying lag times.
From the definition of the autocorrelation function for wide-sense stationary processes, we have
$$R(\tau) = \frac{\mathbb{E}[(X_{t} - \mu)(X_{t + \tau} - \mu)]}{\sigma^{2}} = \frac{1}{\sigma^{2}} \sum_{t = 0}^{k-\tau} (X_{t} - \mu)(X_{t + \tau} - \mu)$$
Here, \$X_{t}\$ represents our 2D array at time \$t\$ and \$\mu\,ドル \$\sigma\$ the average and standard deviation, respectively, of the entire list of 2D arrays. My code for this is as follows
import numpy as np
import matplotlib.pyplot as plt
### Data
# List of data arrays called 'Data'.
### Expectation
New_Data = []
O = np.ones((len(Data[0][0]),len(Data[0][1]))) # Resolution of data arrays.
A = sum(Data)/len(Data) # Average 2D array
M = np.mean(A) # Average of average 2D array
S = np.std(A) # Standard deviation of average 2D array
for i in range(len(Data)):
New_Data.append((Data[i]-M*O)/S) # (X_t-mu)/sigma
### Autocorrelation for varying lags.
Count = 1
R = []
while Count < len(New_Data)//2: # Arbitrary choice for max lag time.
Matrix_Multiply = []
for j in range(len(New_Data)-Count):
Matrix_Multiply.append(np.multiply(New_Data[j],New_Data[j+Count]))
R.append(sum(Matrix_Multiply))
Count = Count+1
Solution = []
for k in range(len(R)):
Solution.append(np.mean(R[k]))
t = [0.1*k for k in range(1,len(Solution)+1)]
### Plotting
plt.xlabel('Lag time')
plt.ylabel('Matrix sum')
plt.title('Field autocorrelation over time')
plt.semilogy(t, Solution)
plt.savefig('I_hope_this_works.png')
I'm sure there are some redundant steps in there, so I was wondering if anyone can see how I could make the code cleaner? Also, if anyone knows how to apply pythons numba package to make the code faster, it would be greatly appreciated, as I haven't been able to figure it out from the documentation in the hyperlink.
EDIT
In response to the comment below, here is some arbitrary data to use
Data = []
n = 20
for k in range(n):
Data.append(np.random.randint(n, size=(n,n)))
For my data, I have ~800 arrays of size 256x256. I don't think I'm able to upload that for anyone to use.
EDIT 2
I just realised that we don't need to keep the data in the R list, so we can remove it as well as the lines
Solution = []
for k in range(len(R)):
Solution.append(np.mean(R[k]))
and edit the while loop to
### Autocorrelation for varying lags.
Count = 1
Solution = []
while Count < len(New_Data)//2: # Arbitrary choice for max lag time.
Matrix_Multiply = []
for j in range(len(New_Data)-Count):
Matrix_Multiply.append(np.multiply(New_Data[j],New_Data[j+Count]))
R = sum(Matrix_Multiply)
Solution.append(np.mean(R))
Count = Count+1
though it doesn't provide much of a markup in terms of speed.
-
\$\begingroup\$ It would help if you could provide some example data. Right now it is very hard to test any alternative approach. \$\endgroup\$Graipher– Graipher2018年06月29日 12:44:30 +00:00Commented Jun 29, 2018 at 12:44
-
1\$\begingroup\$ @Graipher I'm not sure how I could upload any data for people to use? I've added in a list of arrays that people can use to experiment with. \$\endgroup\$Matthew Cassell– Matthew Cassell2018年06月29日 14:52:50 +00:00Commented Jun 29, 2018 at 14:52
-
1\$\begingroup\$ some arbitrary data should suffice. It is just good to be able to verify if the alternative approach gives the same result \$\endgroup\$Graipher– Graipher2018年06月29日 15:10:32 +00:00Commented Jun 29, 2018 at 15:10
-
\$\begingroup\$ Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers . Feel free to post a follow-up question instead once the code has changed significantly. \$\endgroup\$Mast– Mast ♦2018年07月10日 04:26:21 +00:00Commented Jul 10, 2018 at 4:26
1 Answer 1
So here is a slightly simplified version that uses more numpy
functionalities, where your solution manually iterates over the outer lists:
def autocorrelate_graipher(Data):
Data = np.array(Data)
A = Data.mean(axis=0) # Average 2D array
New_Data = (Data - np.mean(A)) / np.std(A)
for count in range(1, len(New_Data) // 2):
i = np.arange(len(New_Data) - count)
yield np.multiply(New_Data[i], New_Data[i+count]).sum(axis=0).mean()
It returns the same result as your function, albeit as a generator, so you need to call list
on the output.
Unfortunately it still has the manual loop over the lags, which I at least made a for
loop instead of a manual while
loop.
Note that Python has an official style-guide, PEP8, which people are encouraged to follow. It recommends using lower_case
for variables and functions, so Data
should be data
, New_Data
should be new_data
, A
should be a
, or even better average
.
-
\$\begingroup\$ Thanks heaps for the response. However, I'm a bit confused by your code. I'm not sure how to return what is being 'yielded' as a list as I haven't used/don't really understand generators, even after doing a fair amount of reading on them. I'm not sure what I am supposed to be calling list on, as there is no output, only something being 'yielded'. How could I return your code as a list of values that I can then plot, without stopping it from being a generator? \$\endgroup\$Matthew Cassell– Matthew Cassell2018年06月30日 04:57:14 +00:00Commented Jun 30, 2018 at 4:57
-
\$\begingroup\$ @Mattos Well,you need to consume the generator to get all values out. The easiest way is to simply do
solution = list(autocorrelate_graipher(Data))
. \$\endgroup\$Graipher– Graipher2018年06月30日 09:02:29 +00:00Commented Jun 30, 2018 at 9:02 -
\$\begingroup\$ @Mattos This is then basically the same as defining an empty list before my for loop, appending each element that is now yielded and returning the list at the end of the function. \$\endgroup\$Graipher– Graipher2018年06月30日 09:04:05 +00:00Commented Jun 30, 2018 at 9:04