Pythonic split list into n random chunks of roughly equal size

Question 1

As part of my implementation of cross-validation, I find myself needing to split a list into chunks of roughly equal size.

import random
def chunk(xs, n):
 ys = list(xs)
 random.shuffle(ys)
 ylen = len(ys)
 size = int(ylen / n)
 chunks = [ys[0+size*i : size*(i+1)] for i in xrange(n)]
 leftover = ylen - size*n
 edge = size*n
 for i in xrange(leftover):
 chunks[i%n].append(ys[edge+i])
 return chunks

This works as intended

>>> chunk(range(10), 3)
[[4, 1, 2, 7], [5, 3, 6], [9, 8, 0]]

But it seems rather long and boring. Is there a library function that could perform this operation? Are there pythonic improvements that can be made to my code?

Question 2

import random
def chunk(xs, n):
 ys = list(xs)

Copies of lists are usually taken using xs[:]

 random.shuffle(ys)
 ylen = len(ys)

I don't think storing the length in a variable actually helps your code much

 size = int(ylen / n)

Use size = ylen // n // is the integer division operator

 chunks = [ys[0+size*i : size*(i+1)] for i in xrange(n)]

Why the 0+?

 leftover = ylen - size*n

Actually, you can find size and leftover using size, leftover = divmod(ylen, n)

 edge = size*n
 for i in xrange(leftover):
 chunks[i%n].append(ys[edge+i])

You can't have len(leftovers) >= n. So you can do:

 for chunk, value in zip(chunks, leftover):
 chunk.append(value)
 return chunks

Some more improvement could be had if you used numpy. If this is part of a number crunching code you should look into it.

Question 3

Is there a library function that could perform this operation?

No.

Are there pythonic improvements that can be made to my code?

A few.

Sorry it seems boring, but there's not much better you can do.

The biggest change might be to make this into a generator function, which may be a tiny bit neater.

def chunk(xs, n):
 ys = list(xs)
 random.shuffle(ys)
 size = len(ys) // n
 leftovers= ys[size*n:]
 for c in xrange(n):
 if leftovers:
 extra= [ leftovers.pop() ] 
 else:
 extra= []
 yield ys[c*size:(c+1)*size] + extra

The use case changes, slightly, depending on what you're doing

chunk_list= list( chunk(range(10),3) )

The if statement can be removed, also, since it's really two generators. But that's being really fussy about performance.

def chunk(xs, n):
 ys = list(xs)
 random.shuffle(ys)
 size = len(ys) // n
 leftovers= ys[size*n:]
 for c, xtra in enumerate(leftovers):
 yield ys[c*size:(c+1)*size] + [ xtra ]
 for c in xrange(c+1,n):
 yield ys[c*size:(c+1)*size]

Question 4

Make it a generator. You could then simplify the logic.

def chunk(xs, n):
 ys = list(xs)
 random.shuffle(ys)
 chunk_length = len(ys) // n
 needs_extra = len(ys) % n
 start = 0
 for i in xrange(n):
 if i < needs_extra:
 end = start + chunk_length + 1
 else:
 end = start + chunk_length
 yield ys[start:end]
 start = end

Winston Ewert 30.7k4 gold badges52 silver badges79 bronze badges · Accepted Answer · 2011-09-16 22:27:21Z

import random
def chunk(xs, n):
 ys = list(xs)

Copies of lists are usually taken using xs[:]

 random.shuffle(ys)
 ylen = len(ys)

I don't think storing the length in a variable actually helps your code much

 size = int(ylen / n)

Use size = ylen // n // is the integer division operator

 chunks = [ys[0+size*i : size*(i+1)] for i in xrange(n)]

Why the 0+?

 leftover = ylen - size*n

Actually, you can find size and leftover using size, leftover = divmod(ylen, n)

 edge = size*n
 for i in xrange(leftover):
 chunks[i%n].append(ys[edge+i])

You can't have len(leftovers) >= n. So you can do:

 for chunk, value in zip(chunks, leftover):
 chunk.append(value)
 return chunks

Some more improvement could be had if you used numpy. If this is part of a number crunching code you should look into it.

Stack Exchange Network

Pythonic split list into n random chunks of roughly equal size

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Pythonic split list into n random chunks of roughly equal size

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions