Fastest way to find common events in 2 time series numpy arrays and calulate synchronisation statistic

Question 1

Given 2 numpy arrays a1 and a2 (composed only of 0 and 1's), find index locations of all 0's in a1 and a2.
Find matching index positions if any between array a1 and a2.
Calculate a metric.

a1 array shape is 2161
a2 array shape is 2161

e.g.

a1 = [0,1,1,0,1,0,1]
a2 = [1,1,0,0,1,1,0]

The indices of all 0's in a1 are 0, 3, and 5. The indices of all 0's in a2 are 2, 3, and 6.

The only common index between a1 and a2 is thus 3.

function_1 performs step1 and step2 and step3

function1_iterations repeats function_1 after randomly shuffling a1 and repeating metric calculation 1000 times. This is for the purpose of finding if metric is statistically significant.

I perform below code on 100 million array pairs, multiprocessed on 256 cores. Best runtime for 100 million array pairs is about 40 mins. Is there any way I can make it significantly efficient? I need to be running this on billions of array pairs.

My code below is the fastest that I could come up with some help from people from codereview earlier:

 def function_1(self, a1, a2):
 event_index1, = np.where(a1 == 0)
 event_index2, = np.where(a2 == 0)
 n1, = event_index1.shape
 n2, = event_index2.shape
 
 if n1 == 0 or n2 == 0:
 return 0, 0
 
 n_matches, = np.intersect1d(event_index1, event_index2, assume_unique=True,).shape
 
 c_ij = c_ji = n_matches/2
 
 metric_1= (c_ij + c_ji) / math.sqrt(n1 * n2)
 
 return metric_1
 
 def function1_iterations(self,a1,a2,repeat = 1000,original_metric1):
 list_metric1 = []
 a1_copy = copy.deepcopy(a1)
 
 for i in range(0, repeat):
 np.random.shuffle(a1_copy) # shuffle bits in array and recalculate 1000 times 
 
 metric_1 = self.function_1(a1= a1_copy, a2 = a2) 
 list_metric1.append(metric_1)
 list_metric1= np.array(list_metric1)
 
 significance_val = len(np.where(list_metric1>= [original_metric1])[0])/repeat
 
 return significance_val

Question 2

tio.run/##fVJNb4MwDL3nV/g0hQ1RyLRLJfYTetodBXBFqiXQELZ1h/… this is a lot faster (~100x)

Question 3

@rak1507 I checked your code snippet. But it does something completely different. Instead of finding the indexes of 0's it counts how many nonzero elements exist in array.

Question 4

@rak1507 Your code needs to be copied from that run into an answer, and this comment thread needs to continue on that answer and not on the question.

Question 5

Note that @rak1507 's solution is only equivalent in results to the OP's if the input dtypes are bool. Otherwise, bitwise operations will produce different results.

Question 6

Thanks @Reinderien for pointing that out and for turning it into a full answer :)

Question 7

Ignoring your function1_iterations: function_1 should call neither where nor intersect1d. Your metric expression numerator just evaluates to n_matches. A faster vectorised approach will be, either in the integer or boolean domain, perform the equivalent of sum(!a1 & !a2):

import math
from timeit import timeit
import numpy as np
def function_1_old(a1, a2):
 event_index1, = np.where(a1 == 0)
 event_index2, = np.where(a2 == 0)
 n1, = event_index1.shape
 n2, = event_index2.shape
 if n1 == 0 or n2 == 0:
 return 0, 0
 n_matches, = np.intersect1d(event_index1, event_index2, assume_unique=True, ).shape
 c_ij = c_ji = n_matches / 2
 metric_1 = (c_ij + c_ji) / math.sqrt(n1 * n2)
 return metric_1
def function_1_arith(a1: np.ndarray, a2: np.ndarray) -> float:
 zeros1 = 1 - a1
 zeros2 = 1 - a2
 n1 = np.sum(zeros1)
 n2 = np.sum(zeros2)
 if n1 == 0 or n2 == 0:
 return 0
 n_matches = np.sum(zeros1 * zeros2)
 return n_matches / np.sqrt(n1 * n2)
def function_1_bool(a1: np.ndarray, a2: np.ndarray) -> float:
 zeros1 = np.logical_not(a1)
 zeros2 = np.logical_not(a2)
 n1 = np.sum(zeros1)
 n2 = np.sum(zeros2)
 if n1 == 0 or n2 == 0:
 return 0
 n_matches = np.sum(np.logical_and(zeros1, zeros2))
 return n_matches / np.sqrt(n1 * n2)
def function_1_demorgan(a1: np.ndarray, a2: np.ndarray) -> float:
 n1 = a1.size - np.sum(a1)
 n2 = a2.size - np.sum(a2)
 if n1 == 0 or n2 == 0:
 return 0
 n_matches = np.sum(np.logical_not(np.logical_or(a1, a2)))
 return n_matches / np.sqrt(n1 * n2)
def function_1_count(a1: np.ndarray, a2: np.ndarray) -> float:
 n1 = a1.size - np.count_nonzero(a1)
 n2 = a2.size - np.count_nonzero(a2)
 if n1 == 0 or n2 == 0:
 return 0
 n_matches = np.count_nonzero(np.logical_not(np.logical_or(a1, a2)))
 return n_matches / np.sqrt(n1 * n2)
def function_1_binary(a1: np.ndarray, a2: np.ndarray) -> float:
 n1 = a1.size - np.count_nonzero(a1)
 n2 = a2.size - np.count_nonzero(a2)
 if n1 == 0 or n2 == 0:
 return 0
 n_matches = np.count_nonzero(~(a1 | a2))
 return n_matches / np.sqrt(n1 * n2)
def test() -> None:
 methods = (function_1_old, function_1_arith, function_1_bool, function_1_demorgan, function_1_count,
 function_1_binary)
 a1 = np.array((0, 1, 1, 0, 1, 0, 1), dtype=bool)
 a2 = np.array((1, 1, 0, 0, 1, 1, 0), dtype=bool)
 # Testing against expected results, with OP's original sample arrays
 for method in methods:
 assert np.isclose(1/3, method(a1, a2), rtol=0, atol=1e-12)
 rand = np.random.default_rng(seed=0) # for reproducible results
 def sample():
 return rand.integers(low=0, high=2, size=(10_000,), dtype=bool)
 a1, a2 = sample(), sample()
 # Test for regression with bigger sample arrays
 reference = None
 for method in methods:
 result = method(a1, a2)
 if method is function_1_old:
 reference = result
 else:
 assert np.isclose(reference, result, rtol=0, atol=1e-12)
 # Performance measurement with those arrays
 n = 1_000
 for method in methods:
 dur = timeit(lambda: method(a1, a2), number=n)
 print(f'{method.__name__}: {dur/n*1e6:.1f} us')
'''
function_1_old: 386.6 us
function_1_arith: 54.8 us
function_1_bool: 44.2 us
function_1_demorgan: 43.1 us
function_1_count: 6.9 us
function_1_binary: 6.7 us
'''
if __name__ == '__main__':
 test()

Question 8

Thanks for this comparison. I went ahead with the binary bool array version. I was wondering how to approach such efficiency problems in the future? Like to know binary version is much faster than the one I was using naively.

Question 9

A mix of knowledge (full-matrix vectorisation is generally faster than indexing) and guessing (are bitwise operators faster than boolean operators?) backed up by measurement. Always, always measure.

Question 10

I'm not 100% confident that I understand your intent, but looking at your code it seems like all we care about is the number of zeroes in each list and the number of times those zeroes line up by index. The actual indexes themselves dont seem to matter.

I found that just ignoring the arrays, and treating the whole thing as a simple list problem, was more than twice as fast.

def function_1_new(a1, a2):
 n_matches = 0
 n_zeroes_a1 = 0
 n_zeroes_a2 =0
 for sub_a1, sub_a2 in zip(a1, a2):
 if sub_a1 == sub_a2 == 0:
 n_matches += 1
 n_zeroes_a1 += 1
 n_zeroes_a2 += 1
 else:
 if sub_a1 == 0:
 n_zeroes_a1 += 1
 if sub_a2 == 0:
 n_zeroes_a2 += 1
 c_ij = c_ji = n_matches/2
 metric_1= (c_ij + c_ji) / math.sqrt(n_zeroes_a1 * n_zeroes_a2)
 return metric_1

Running your original function 1,000,000 times took me 26.5 seconds on average. The more naïve approach above averaged around 11 seconds. And here's how I tested to make sure the new code produces the same results as the old code.

def function1_iterations_sameness(a1,a2, original_metric1, repeat = 1000000):
 a1_copy = copy.deepcopy(a1)
 for i in range(0, repeat):
 np.random.shuffle(a1_copy) # shuffle bits in array and recalculate 1000 times 
 metric_1_old = function_1(a1= a1_copy, a2 = a2) 
 metric_1_new = function_1_new(a1_copy, a2)
 assert metric_1_old == metric_1_new

Question 11

This implementation is logically correct but in terms of performance it's a disaster. I encourage you to revisit your performance check. Where the fastest method I evaluated completes in 7.8 microseconds, yours took almost four thousand times (!) as long.

Reinderien Reinderien 70.9k5 gold badges76 silver badges256 bronze badges · Accepted Answer · 2022-01-07 18:46:05Z

Ignoring your function1_iterations: function_1 should call neither where nor intersect1d. Your metric expression numerator just evaluates to n_matches. A faster vectorised approach will be, either in the integer or boolean domain, perform the equivalent of sum(!a1 & !a2):

import math
from timeit import timeit
import numpy as np
def function_1_old(a1, a2):
 event_index1, = np.where(a1 == 0)
 event_index2, = np.where(a2 == 0)
 n1, = event_index1.shape
 n2, = event_index2.shape
 if n1 == 0 or n2 == 0:
 return 0, 0
 n_matches, = np.intersect1d(event_index1, event_index2, assume_unique=True, ).shape
 c_ij = c_ji = n_matches / 2
 metric_1 = (c_ij + c_ji) / math.sqrt(n1 * n2)
 return metric_1
def function_1_arith(a1: np.ndarray, a2: np.ndarray) -> float:
 zeros1 = 1 - a1
 zeros2 = 1 - a2
 n1 = np.sum(zeros1)
 n2 = np.sum(zeros2)
 if n1 == 0 or n2 == 0:
 return 0
 n_matches = np.sum(zeros1 * zeros2)
 return n_matches / np.sqrt(n1 * n2)
def function_1_bool(a1: np.ndarray, a2: np.ndarray) -> float:
 zeros1 = np.logical_not(a1)
 zeros2 = np.logical_not(a2)
 n1 = np.sum(zeros1)
 n2 = np.sum(zeros2)
 if n1 == 0 or n2 == 0:
 return 0
 n_matches = np.sum(np.logical_and(zeros1, zeros2))
 return n_matches / np.sqrt(n1 * n2)
def function_1_demorgan(a1: np.ndarray, a2: np.ndarray) -> float:
 n1 = a1.size - np.sum(a1)
 n2 = a2.size - np.sum(a2)
 if n1 == 0 or n2 == 0:
 return 0
 n_matches = np.sum(np.logical_not(np.logical_or(a1, a2)))
 return n_matches / np.sqrt(n1 * n2)
def function_1_count(a1: np.ndarray, a2: np.ndarray) -> float:
 n1 = a1.size - np.count_nonzero(a1)
 n2 = a2.size - np.count_nonzero(a2)
 if n1 == 0 or n2 == 0:
 return 0
 n_matches = np.count_nonzero(np.logical_not(np.logical_or(a1, a2)))
 return n_matches / np.sqrt(n1 * n2)
def function_1_binary(a1: np.ndarray, a2: np.ndarray) -> float:
 n1 = a1.size - np.count_nonzero(a1)
 n2 = a2.size - np.count_nonzero(a2)
 if n1 == 0 or n2 == 0:
 return 0
 n_matches = np.count_nonzero(~(a1 | a2))
 return n_matches / np.sqrt(n1 * n2)
def test() -> None:
 methods = (function_1_old, function_1_arith, function_1_bool, function_1_demorgan, function_1_count,
 function_1_binary)
 a1 = np.array((0, 1, 1, 0, 1, 0, 1), dtype=bool)
 a2 = np.array((1, 1, 0, 0, 1, 1, 0), dtype=bool)
 # Testing against expected results, with OP's original sample arrays
 for method in methods:
 assert np.isclose(1/3, method(a1, a2), rtol=0, atol=1e-12)
 rand = np.random.default_rng(seed=0) # for reproducible results
 def sample():
 return rand.integers(low=0, high=2, size=(10_000,), dtype=bool)
 a1, a2 = sample(), sample()
 # Test for regression with bigger sample arrays
 reference = None
 for method in methods:
 result = method(a1, a2)
 if method is function_1_old:
 reference = result
 else:
 assert np.isclose(reference, result, rtol=0, atol=1e-12)
 # Performance measurement with those arrays
 n = 1_000
 for method in methods:
 dur = timeit(lambda: method(a1, a2), number=n)
 print(f'{method.__name__}: {dur/n*1e6:.1f} us')
'''
function_1_old: 386.6 us
function_1_arith: 54.8 us
function_1_bool: 44.2 us
function_1_demorgan: 43.1 us
function_1_count: 6.9 us
function_1_binary: 6.7 us
'''
if __name__ == '__main__':
 test()

Thanks for this comparison. I went ahead with the binary bool array version. I was wondering how to approach such efficiency problems in the future? Like to know binary version is much faster than the one I was using naively.
A mix of knowledge (full-matrix vectorisation is generally faster than indexing) and guessing (are bitwise operators faster than boolean operators?) backed up by measurement. Always, always measure.

Stack Exchange Network

Fastest way to find common events in 2 time series numpy arrays and calulate synchronisation statistic

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Fastest way to find common events in 2 time series numpy arrays and calulate synchronisation statistic

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions