6
\$\begingroup\$

As part of my implementation of cross-validation, I find myself needing to split a list into chunks of roughly equal size.

import random
def chunk(xs, n):
 ys = list(xs)
 random.shuffle(ys)
 ylen = len(ys)
 size = int(ylen / n)
 chunks = [ys[0+size*i : size*(i+1)] for i in xrange(n)]
 leftover = ylen - size*n
 edge = size*n
 for i in xrange(leftover):
 chunks[i%n].append(ys[edge+i])
 return chunks

This works as intended

>>> chunk(range(10), 3)
[[4, 1, 2, 7], [5, 3, 6], [9, 8, 0]]

But it seems rather long and boring. Is there a library function that could perform this operation? Are there pythonic improvements that can be made to my code?

asked Sep 16, 2011 at 19:46
\$\endgroup\$

3 Answers 3

5
\$\begingroup\$
import random
def chunk(xs, n):
 ys = list(xs)

Copies of lists are usually taken using xs[:]

 random.shuffle(ys)
 ylen = len(ys)

I don't think storing the length in a variable actually helps your code much

 size = int(ylen / n)

Use size = ylen // n // is the integer division operator

 chunks = [ys[0+size*i : size*(i+1)] for i in xrange(n)]

Why the 0+?

 leftover = ylen - size*n

Actually, you can find size and leftover using size, leftover = divmod(ylen, n)

 edge = size*n
 for i in xrange(leftover):
 chunks[i%n].append(ys[edge+i])

You can't have len(leftovers) >= n. So you can do:

 for chunk, value in zip(chunks, leftover):
 chunk.append(value)
 return chunks

Some more improvement could be had if you used numpy. If this is part of a number crunching code you should look into it.

answered Sep 16, 2011 at 22:27
\$\endgroup\$
2
\$\begingroup\$

Is there a library function that could perform this operation?

No.

Are there pythonic improvements that can be made to my code?

A few.

Sorry it seems boring, but there's not much better you can do.

The biggest change might be to make this into a generator function, which may be a tiny bit neater.

def chunk(xs, n):
 ys = list(xs)
 random.shuffle(ys)
 size = len(ys) // n
 leftovers= ys[size*n:]
 for c in xrange(n):
 if leftovers:
 extra= [ leftovers.pop() ] 
 else:
 extra= []
 yield ys[c*size:(c+1)*size] + extra

The use case changes, slightly, depending on what you're doing

chunk_list= list( chunk(range(10),3) )

The if statement can be removed, also, since it's really two generators. But that's being really fussy about performance.

def chunk(xs, n):
 ys = list(xs)
 random.shuffle(ys)
 size = len(ys) // n
 leftovers= ys[size*n:]
 for c, xtra in enumerate(leftovers):
 yield ys[c*size:(c+1)*size] + [ xtra ]
 for c in xrange(c+1,n):
 yield ys[c*size:(c+1)*size]
answered Sep 16, 2011 at 20:51
\$\endgroup\$
1
\$\begingroup\$

Make it a generator. You could then simplify the logic.

def chunk(xs, n):
 ys = list(xs)
 random.shuffle(ys)
 chunk_length = len(ys) // n
 needs_extra = len(ys) % n
 start = 0
 for i in xrange(n):
 if i < needs_extra:
 end = start + chunk_length + 1
 else:
 end = start + chunk_length
 yield ys[start:end]
 start = end
answered Sep 16, 2011 at 21:13
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.