Moving mean square error between 2 arrays, 'valid', where they fully overlap

Question 1

You can find my original post here on SO. I was very surprised that I couldn't find a pre-coded function doing a mean square error between a signal (array A, size a) and a pattern (array B, size b<a) in a sliding window fashion. Thus I made one... and obviously it needs a lot of code improvement. I don't really know how to vectorize the implementation.

The goal is to take the pattern, slide it across the signal one step at a time, and determine everytime the mean square error between the signal and the pattern.

from sklearn.metrics import mean_squared_error
import numpy as np
def correlate_edge_pattern(signal, pattern):
 
 ms_errors = list()
 k=0
 
 while True:
 
 data = signal[k:k+pattern.size]
 
 if data.size != pattern.size:
 break
 
 err = mean_squared_error(pattern, data)
 ms_errors.append(err)
 k+=1
 
 m = np.argmin(ms_errors)
 
 return m

As you can observe below, it works as expected:

correlated pattern

Here is a small code sample to test it:

signal = -np.ones(1000)*20
signal[:100] = 0
signal[900:] = 0
noise = np.random.normal(0,1,size=1000)
noisy_signal = signal + noise
pattern = np.zeros(50)
pattern[:25] = -20

Question 2

Some Python basics: loop like a native on k - loop through a range instead of manually incrementing it, and don't break. Replace list() with the literal []. Omitting the argmin, this looks like

def edge_pattern_op(
 signal: np.ndarray,
 pattern: np.ndarray,
) -> np.ndarray:
 ms_errors = []
 for k in range(signal.size - pattern.size + 1):
 data = signal[k: k + pattern.size]
 err = mean_squared_error(pattern, data)
 ms_errors.append(err)
 return np.array(ms_errors)

sklearn is not needed for this and probably should not be used. A naive replacement can look like

def edge_pattern_mean(
 signal: np.ndarray,
 pattern: np.ndarray,
) -> np.ndarray:
 rolled_signal = numpy.lib.stride_tricks.sliding_window_view(
 signal, window_shape=pattern.shape,
 )
 errors = rolled_signal - pattern
 return (errors**2).mean(axis=1)

But notice that you can combine the sum-of-squares operation to an Einstein sum:

def edge_pattern_einstein(
 signal: np.ndarray,
 pattern: np.ndarray,
) -> np.ndarray:
 rolled_signal = numpy.lib.stride_tricks.sliding_window_view(
 signal, window_shape=pattern.shape,
 )
 errors = rolled_signal - pattern
 sqerr = np.einsum('ij,ij->i', errors, errors)
 return sqerr/errors.shape[1]

Since you only care about argmin, the division can be omitted for more speed. This converts the problem from minimum MSE to an equivalent minimum quadratic loss.

But really, all of this should go away and be replaced with a call to scipy.signal.correlation_lags.

Reinderien Reinderien 70.9k5 gold badges76 silver badges256 bronze badges · Answer 1 · 2024-11-18 14:00:19Z

Some Python basics: loop like a native on k - loop through a range instead of manually incrementing it, and don't break. Replace list() with the literal []. Omitting the argmin, this looks like

def edge_pattern_op(
 signal: np.ndarray,
 pattern: np.ndarray,
) -> np.ndarray:
 ms_errors = []
 for k in range(signal.size - pattern.size + 1):
 data = signal[k: k + pattern.size]
 err = mean_squared_error(pattern, data)
 ms_errors.append(err)
 return np.array(ms_errors)

sklearn is not needed for this and probably should not be used. A naive replacement can look like

def edge_pattern_mean(
 signal: np.ndarray,
 pattern: np.ndarray,
) -> np.ndarray:
 rolled_signal = numpy.lib.stride_tricks.sliding_window_view(
 signal, window_shape=pattern.shape,
 )
 errors = rolled_signal - pattern
 return (errors**2).mean(axis=1)

But notice that you can combine the sum-of-squares operation to an Einstein sum:

def edge_pattern_einstein(
 signal: np.ndarray,
 pattern: np.ndarray,
) -> np.ndarray:
 rolled_signal = numpy.lib.stride_tricks.sliding_window_view(
 signal, window_shape=pattern.shape,
 )
 errors = rolled_signal - pattern
 sqerr = np.einsum('ij,ij->i', errors, errors)
 return sqerr/errors.shape[1]

Since you only care about argmin, the division can be omitted for more speed. This converts the problem from minimum MSE to an equivalent minimum quadratic loss.

But really, all of this should go away and be replaced with a call to scipy.signal.correlation_lags.

Stack Exchange Network

Moving mean square error between 2 arrays, 'valid', where they fully overlap

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Moving mean square error between 2 arrays, 'valid', where they fully overlap

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions