I need a function to iterate through a python iterable in chunks. That is, it takes an iterable and a size of n and yields generators iterating through each chunk of size n. After some experimentation, I wrote this stupid hack because it seems there is no easy way to preemptively check whether an iterable has been exhausted. How can I improve this code?
def iterchunks(it, n):
def next_n(it_, n_, return_first = None):
if return_first is not None:
n_ -= 1
yield return_first
for _ in range(n_):
yield next(it_)
# check if the iterator is exhausted by advancing the iterator,
# if not return the value returned by advancing the iterator along with the boolean result
def exhausted(it_):
res = next(it_, None)
return res is None, res
while True:
exhsted, rf = exhausted(it)
if exhsted:
return
else:
# if the iterator is not exhausted, yield the returned value along with the next chunk
yield next_n(it, n, rf)
-
\$\begingroup\$ Hmm, not exact solution, but... do you know StopIteration exception? try: while True: yield [next(it) for _ in range(n)] except StopIteration: pass \$\endgroup\$enedil– enedil2017年04月26日 22:20:43 +00:00Commented Apr 26, 2017 at 22:20
-
\$\begingroup\$ @enedil Yes, if I just pack the generators into lists or tuples, there would be no problem since the StopIteration exception would be triggered upon calling any empty generators. That would sacrifice some flexibility and laziness though. \$\endgroup\$charlieh_7– charlieh_72017年04月26日 22:39:30 +00:00Commented Apr 26, 2017 at 22:39
1 Answer 1
Your code has a significant bug. If I ask it to chunk a list with None at a multiple of n plus 1 spot (c * n + 1), it will not return the rest of the list
xs = list(range(75, 90))
xs[5] = None
print([list(c) for c in iterchunks(iter(xs), 5)])
# Outputs [[75, 76, 77, 78, 79]]
# Expected [[75, 76, 77, 78, 79], [None, 81, 82, 83, 84], [85, 86, 87, 88, 89]]
To resolve this, use the standard practice of trying something, and asking for forgiveness later. I would suggest either an iterable you build up. Still this seems like a case of reinventing the wheel, it is unfortunate python doesn't have it built in to the itertools library. It does define grouper in the docs of itertools, which is kinda what you want, except it pads a fill value to the end of the iterable.
def chunk(it, n):
try:
while True:
xs = [] # The buffer to hold the next n items
for _ in range(n):
xs.append(next(it))
yield xs
except StopIteration:
yield xs
This is the code from the itertools docs here, with one amendment, to yield from instead of returning the iterable it creates
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
yield from zip_longest(*[iter(iterable)] * n, fillvalue=fillvalue)