Computing the autocorrelation between 2D time series arrays in python

Question 1

I have a list of 2D arrays which comes from a time evolution PDE (in \$(x, y, t)\$) that I solved numerically. There are \$k\$ arrays, which all have the same dimensions, and the arrays correspond to a solution field of the PDE at times \$t = 0, 0.1, \dots, 0.1k\$. I wrote a program that takes in this list and computes the autocorrelation between the arrays for varying lag times.

From the definition of the autocorrelation function for wide-sense stationary processes, we have

$$R(\tau) = \frac{\mathbb{E}[(X_{t} - \mu)(X_{t + \tau} - \mu)]}{\sigma^{2}} = \frac{1}{\sigma^{2}} \sum_{t = 0}^{k-\tau} (X_{t} - \mu)(X_{t + \tau} - \mu)$$

Here, \$X_{t}\$ represents our 2D array at time \$t\$ and \$\mu\,ドル \$\sigma\$ the average and standard deviation, respectively, of the entire list of 2D arrays. My code for this is as follows

import numpy as np
import matplotlib.pyplot as plt
### Data
# List of data arrays called 'Data'.
### Expectation
New_Data = []
O = np.ones((len(Data[0][0]),len(Data[0][1]))) # Resolution of data arrays.
A = sum(Data)/len(Data) # Average 2D array
M = np.mean(A) # Average of average 2D array
S = np.std(A) # Standard deviation of average 2D array
for i in range(len(Data)):
 New_Data.append((Data[i]-M*O)/S) # (X_t-mu)/sigma
### Autocorrelation for varying lags.
Count = 1
R = []
while Count < len(New_Data)//2: # Arbitrary choice for max lag time.
 Matrix_Multiply = []
 for j in range(len(New_Data)-Count): 
 Matrix_Multiply.append(np.multiply(New_Data[j],New_Data[j+Count]))
 R.append(sum(Matrix_Multiply))
 Count = Count+1
Solution = []
for k in range(len(R)):
 Solution.append(np.mean(R[k]))
t = [0.1*k for k in range(1,len(Solution)+1)]
### Plotting 
plt.xlabel('Lag time')
plt.ylabel('Matrix sum')
plt.title('Field autocorrelation over time')
plt.semilogy(t, Solution)
plt.savefig('I_hope_this_works.png')

I'm sure there are some redundant steps in there, so I was wondering if anyone can see how I could make the code cleaner? Also, if anyone knows how to apply pythons numba package to make the code faster, it would be greatly appreciated, as I haven't been able to figure it out from the documentation in the hyperlink.

EDIT

In response to the comment below, here is some arbitrary data to use

Data = []
n = 20
for k in range(n):
 Data.append(np.random.randint(n, size=(n,n)))

For my data, I have ~800 arrays of size 256x256. I don't think I'm able to upload that for anyone to use.

EDIT 2

I just realised that we don't need to keep the data in the R list, so we can remove it as well as the lines

Solution = []
 for k in range(len(R)):
 Solution.append(np.mean(R[k]))

and edit the while loop to

### Autocorrelation for varying lags. 
Count = 1
Solution = []
while Count < len(New_Data)//2: # Arbitrary choice for max lag time. 
 Matrix_Multiply = []
 for j in range(len(New_Data)-Count):
 Matrix_Multiply.append(np.multiply(New_Data[j],New_Data[j+Count]))
 R = sum(Matrix_Multiply)
 Solution.append(np.mean(R))
 Count = Count+1

though it doesn't provide much of a markup in terms of speed.

Question 2

It would help if you could provide some example data. Right now it is very hard to test any alternative approach.

Question 3

@Graipher I'm not sure how I could upload any data for people to use? I've added in a list of arrays that people can use to experiment with.

Question 4

some arbitrary data should suffice. It is just good to be able to verify if the alternative approach gives the same result

Question 5

Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers . Feel free to post a follow-up question instead once the code has changed significantly.

Question 6

So here is a slightly simplified version that uses more numpy functionalities, where your solution manually iterates over the outer lists:

def autocorrelate_graipher(Data):
 Data = np.array(Data)
 A = Data.mean(axis=0) # Average 2D array
 New_Data = (Data - np.mean(A)) / np.std(A)
 for count in range(1, len(New_Data) // 2):
 i = np.arange(len(New_Data) - count)
 yield np.multiply(New_Data[i], New_Data[i+count]).sum(axis=0).mean()

It returns the same result as your function, albeit as a generator, so you need to call list on the output.

Unfortunately it still has the manual loop over the lags, which I at least made a for loop instead of a manual while loop.

Note that Python has an official style-guide, PEP8, which people are encouraged to follow. It recommends using lower_case for variables and functions, so Data should be data, New_Data should be new_data, A should be a, or even better average.

Question 7

Thanks heaps for the response. However, I'm a bit confused by your code. I'm not sure how to return what is being 'yielded' as a list as I haven't used/don't really understand generators, even after doing a fair amount of reading on them. I'm not sure what I am supposed to be calling list on, as there is no output, only something being 'yielded'. How could I return your code as a list of values that I can then plot, without stopping it from being a generator?

Question 8

@Mattos Well,you need to consume the generator to get all values out. The easiest way is to simply do solution = list(autocorrelate_graipher(Data)).

Question 9

@Mattos This is then basically the same as defining an empty list before my for loop, appending each element that is now yielded and returning the list at the end of the function.

Graipher Graipher 41.6k7 gold badges70 silver badges134 bronze badges · Accepted Answer · 2018-06-29 16:14:04Z

So here is a slightly simplified version that uses more numpy functionalities, where your solution manually iterates over the outer lists:

def autocorrelate_graipher(Data):
 Data = np.array(Data)
 A = Data.mean(axis=0) # Average 2D array
 New_Data = (Data - np.mean(A)) / np.std(A)
 for count in range(1, len(New_Data) // 2):
 i = np.arange(len(New_Data) - count)
 yield np.multiply(New_Data[i], New_Data[i+count]).sum(axis=0).mean()

It returns the same result as your function, albeit as a generator, so you need to call list on the output.

Unfortunately it still has the manual loop over the lags, which I at least made a for loop instead of a manual while loop.

Note that Python has an official style-guide, PEP8, which people are encouraged to follow. It recommends using lower_case for variables and functions, so Data should be data, New_Data should be new_data, A should be a, or even better average.

Thanks heaps for the response. However, I'm a bit confused by your code. I'm not sure how to return what is being 'yielded' as a list as I haven't used/don't really understand generators, even after doing a fair amount of reading on them. I'm not sure what I am supposed to be calling list on, as there is no output, only something being 'yielded'. How could I return your code as a list of values that I can then plot, without stopping it from being a generator?
@Mattos Well,you need to consume the generator to get all values out. The easiest way is to simply do solution = list(autocorrelate_graipher(Data)).
@Mattos This is then basically the same as defining an empty list before my for loop, appending each element that is now yielded and returning the list at the end of the function.

Stack Exchange Network

Computing the autocorrelation between 2D time series arrays in python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Computing the autocorrelation between 2D time series arrays in python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions