You can find my original post here on SO. I was very surprised that I couldn't find a pre-coded function doing a mean square error between a signal (array A, size a) and a pattern (array B, size b<a) in a sliding window fashion. Thus I made one... and obviously it needs a lot of code improvement. I don't really know how to vectorize the implementation.
The goal is to take the pattern, slide it across the signal one step at a time, and determine everytime the mean square error between the signal and the pattern.
from sklearn.metrics import mean_squared_error
import numpy as np
def correlate_edge_pattern(signal, pattern):
ms_errors = list()
k=0
while True:
data = signal[k:k+pattern.size]
if data.size != pattern.size:
break
err = mean_squared_error(pattern, data)
ms_errors.append(err)
k+=1
m = np.argmin(ms_errors)
return m
As you can observe below, it works as expected:
Here is a small code sample to test it:
signal = -np.ones(1000)*20
signal[:100] = 0
signal[900:] = 0
noise = np.random.normal(0,1,size=1000)
noisy_signal = signal + noise
pattern = np.zeros(50)
pattern[:25] = -20
1 Answer 1
Some Python basics: loop like a native on k
- loop through a range
instead of manually incrementing it, and don't break
. Replace list()
with the literal []
. Omitting the argmin
, this looks like
def edge_pattern_op(
signal: np.ndarray,
pattern: np.ndarray,
) -> np.ndarray:
ms_errors = []
for k in range(signal.size - pattern.size + 1):
data = signal[k: k + pattern.size]
err = mean_squared_error(pattern, data)
ms_errors.append(err)
return np.array(ms_errors)
sklearn
is not needed for this and probably should not be used. A naive replacement can look like
def edge_pattern_mean(
signal: np.ndarray,
pattern: np.ndarray,
) -> np.ndarray:
rolled_signal = numpy.lib.stride_tricks.sliding_window_view(
signal, window_shape=pattern.shape,
)
errors = rolled_signal - pattern
return (errors**2).mean(axis=1)
But notice that you can combine the sum-of-squares operation to an Einstein sum:
def edge_pattern_einstein(
signal: np.ndarray,
pattern: np.ndarray,
) -> np.ndarray:
rolled_signal = numpy.lib.stride_tricks.sliding_window_view(
signal, window_shape=pattern.shape,
)
errors = rolled_signal - pattern
sqerr = np.einsum('ij,ij->i', errors, errors)
return sqerr/errors.shape[1]
Since you only care about argmin
, the division can be omitted for more speed. This converts the problem from minimum MSE to an equivalent minimum quadratic loss.
But really, all of this should go away and be replaced with a call to scipy.signal.correlation_lags.