Efficient resampling of time series

Question 1

I'm trying to create an efficient function for re-sampling time-series data.

Assumption: Both sets of time-series data have the same start and end time. (I do this in a separate step.)

Resample function (inefficient)

import numpy as np
def resample(desired_time_sequence, data_sequence):
 downsampling_indices = np.linspace(0, len(data_sequence)-1, len(desired_time_sequence)).round().astype(int)
 downsampled_array = [data_sequence[ind] for ind in downsampling_indices] 
 return downsampled_array

Speed testing

import timeit
def test_speed(): resample([1,2,3], [.5,1,1.5,2,2.5,3,3.5,4,4.5,5,5.5,6])
print(timeit.timeit(test_speed, number=100000))
# 1.5003695999998854

Interested to hear any suggestions.

Question 2

The function takes around \41ドル\mu s\$ on average per run on my machine. About three quarters of it (around \$ 32\mu s\$) are spent for downsampling_indices = np.linspace(...). Add another \1ドル.5\mu s\$ for round().astype(int), about \1ドル\mu s\$ for the actual sampling, plus some calling overhead, and you're there.

So if you would need to use the function several times, it would be best to precompute or cache/memoize sampling indices. If I understood your implementation correctly, the downsampling index computation is basically data independent and only depends on the length of the two sequences, so that might be actually viable.

For example you could have

import functools
...
@functools.lru_cache()
def compute_downsampling_indices_cached(n_samples, data_sequence_len):
 """Compute n_samples downsampling indices for data sequences of a given length"""
 return np.linspace(0, data_sequence_len-1, n_samples).round().astype(int)

and then do

def resample_cache(n_samples, data_sequence):
 downsampling_indices = compute_downsampling_indices_cached(n_samples, len(data_sequence))
 return [data_sequence[ind] for ind in downsampling_indices]

Note that I replaced desired_time_sequence by n_samples which would then have to be set to len(desired_time_sequence) since you don't care about the actual values in desired_time_sequence.

It might also be possible to benefit from NumPy's indexing and use return np.array(data_sequence)[downsampling_indices] for larger inputs. You will have to check that yourself.

On my machine resample_cache(...) takes \1ドル.7\mu s\$, which is about a decent 20x speed up.

AlexV AlexV 7,3532 gold badges24 silver badges47 bronze badges · Accepted Answer · 2019-06-21 10:02:21Z

The function takes around \41ドル\mu s\$ on average per run on my machine. About three quarters of it (around \$ 32\mu s\$) are spent for downsampling_indices = np.linspace(...). Add another \1ドル.5\mu s\$ for round().astype(int), about \1ドル\mu s\$ for the actual sampling, plus some calling overhead, and you're there.

So if you would need to use the function several times, it would be best to precompute or cache/memoize sampling indices. If I understood your implementation correctly, the downsampling index computation is basically data independent and only depends on the length of the two sequences, so that might be actually viable.

For example you could have

import functools
...
@functools.lru_cache()
def compute_downsampling_indices_cached(n_samples, data_sequence_len):
 """Compute n_samples downsampling indices for data sequences of a given length"""
 return np.linspace(0, data_sequence_len-1, n_samples).round().astype(int)

and then do

def resample_cache(n_samples, data_sequence):
 downsampling_indices = compute_downsampling_indices_cached(n_samples, len(data_sequence))
 return [data_sequence[ind] for ind in downsampling_indices]

Note that I replaced desired_time_sequence by n_samples which would then have to be set to len(desired_time_sequence) since you don't care about the actual values in desired_time_sequence.

It might also be possible to benefit from NumPy's indexing and use return np.array(data_sequence)[downsampling_indices] for larger inputs. You will have to check that yourself.

On my machine resample_cache(...) takes \1ドル.7\mu s\$, which is about a decent 20x speed up.

Stack Exchange Network

Efficient resampling of time series

Resample function (inefficient)

Speed testing

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Efficient resampling of time series

Resample function (inefficient)

Speed testing

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions