3
\$\begingroup\$

I'm creating a small library of Python utilities, and I'd like feedback on a function which allows iterating over an arbitrary iterable in a sliding-window fashion.

Relevant parts of iteration.py:

import collections
import itertools
def sliding_window_iter(iterable, size):
 """Iterate through iterable using a sliding window of several elements.
 Creates an iterable where each element is a tuple of `size`
 consecutive elements from `iterable`, advancing by 1 element each
 time. For example:
 >>> list(sliding_window_iter([1, 2, 3, 4], 2))
 [(1, 2), (2, 3), (3, 4)]
 """
 iterable = iter(iterable)
 window = collections.deque(
 itertools.islice(iterable, size-1),
 maxlen=size
 )
 for item in iterable:
 window.append(item)
 yield tuple(window)

Relevant parts of the test file iteration_test.py:

import doctest
import unittest
import iteration
from iteration import *
class TestSlidingWindowIter(unittest.TestCase):
 def test_good(self):
 self.assertSequenceEqual(
 list(sliding_window_iter([1, 2, 3, 4], 2)),
 [(1, 2), (2, 3), (3, 4)]
 )
 def test_exact_length(self):
 self.assertSequenceEqual(
 list(sliding_window_iter(["c", "b", "a"], 3)),
 [("c", "b", "a")]
 )
 def test_short(self):
 self.assertSequenceEqual(
 list(sliding_window_iter([1, 2], 3)),
 []
 )
 def test_size_one(self):
 self.assertSequenceEqual(
 list(sliding_window_iter([1, 2, 3, 4], 1)),
 [(1,), (2,), (3,), (4,)]
 )
 def test_bad_size(self):
 with self.assertRaises(ValueError):
 list(sliding_window_iter([1, 2], 0))
def run():
 if not doctest.testmod(iteration)[0]:
 print("doctest: OK")
 unittest.main()
if __name__ == "__main__":
 run()

I'm primarily looking for feedback in these areas:

  • Is the code Pythonic? I'm primarily a C++ developer, so Python idioms don't come naturally to me; that's one thing I'm constantly trying to improve.
  • Are there any potential performance problems?
  • Am I reinventing a wheel and something like this already exists?

I'll welcome any other feedback too, of course.

asked Mar 24, 2020 at 11:36
\$\endgroup\$
1
  • 1
    \$\begingroup\$ As for the last question, of course you are. See rolling or more-itertools on PyPI. \$\endgroup\$ Commented Mar 24, 2020 at 21:21

2 Answers 2

6
\$\begingroup\$

Looks pretty good to me. You are taking advantage of the standard library, are following the relevant style guides and you even have tests, good job!

Personally, I am more in favor of using import from, at least for well-known standard library functions and maybe for utility functions that are used everywhere throughout a project. This way you can more often fit the code on one line easily, making it more readable.

from itertools import islice
from collections import deque

One small improvement is that your initial slice can be one larger, the argument behaves just like a range or slice argument and is exclusive of the end (i.e islice(x, 4) takes the first four elements since it is analogous to x[0:4], x[:4] and range(0, 4)). However, this way you need to deal with the last element in a special way, so YMMV:

def sliding_window_iter(iterable, size):
 """..."""
 iterable = iter(iterable)
 window = deque(islice(iterable, size), maxlen=size)
 for item in iterable:
 yield tuple(window)
 window.append(item)
 if window: 
 # needed because if iterable was already empty before the `for`,
 # then the window would be yielded twice.
 yield tuple(window)

This also reveals one behavior which you should at least be aware of (which seems to be the case, since you have a test for it). Arguable, sliding_window_iter([1, 2], 3) could yield (1, 2), i.e. the function maybe shouldn't swallow any data. In any case, the chosen behavior should be documented in the docstring.

answered Mar 24, 2020 at 12:34
\$\endgroup\$
2
  • 1
    \$\begingroup\$ Thanks! I actually consider "swallow data if too short" a feature, as my original motivation for this utility was computing differences of adjacent elements etc., and passing a further function a tuple shorter than it expects didn't sound healthy. On the other hand, I can imagine applications where yielding the shorter one would be preferable. Do you think adding an optional parameter like , allow_smaller=False would be a good idea? \$\endgroup\$ Commented Mar 24, 2020 at 13:00
  • \$\begingroup\$ @AngewisnolongerproudofSO: Yeah, that might be a good compromise. Whether you add it or not, the (default) behaviour should be documented in the docstring. \$\endgroup\$ Commented Mar 24, 2020 at 13:08
4
\$\begingroup\$

There are a couple of ways to do this. I thought using deque() like the original code would be slow, but it is actually faster than most I tried. For reasonable window sizes the code below runs in 2/3 the time. In some tests with large windows (>= 100), it was slower, but not always.

from itertools import islice, tee
def sliding_window_iter4(iterable, size):
 iterables = tee(iter(iterable), size)
 window = zip(*(islice(t, n, None) for n,t in enumerate(iterables)))
 yield from window
answered Mar 24, 2020 at 23:38
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.