Problem
Given a list and an integer chunk size, divide the list into sublists, where each sublist is of chunk size.
Codes
Just for practicing, I've coded to solve the problem using JavaScript and Python. There is also a built-in PHP function for that. If you'd like to review the codes and provide any change/improvement recommendations, please do so and I'd really appreciate that.
Python
"""
# Problem
# Given a list and chunk size, divide the list into sublists
# where each sublist is of chunk size
# --- Examples: (Input List, Chunk Size) => Output List
# ([1, 2, 3, 4], 2) => [[ 1, 2], [3, 4]]
# ([1, 2, 3, 4, 5], 2) => [[ 1, 2], [3, 4], [5]]
# ([1, 2, 3, 4, 5, 6, 7, 8], 3) => [[ 1, 2, 3], [4, 5, 6], [7, 8]]
# ([1, 2, 3, 4, 5], 4) => [[ 1, 2, 3, 4], [5]]
# ([1, 2, 3, 4, 5], 10) => [[ 1, 2, 3, 4, 5]]
"""
from typing import TypeVar, List
import math
T = TypeVar('T')
def chunk_with_slice(input_list: List['T'], chunk_size: int) -> List['T']:
if chunk_size <= 0 or not isinstance(input_list, list):
return False
start_index = 0
end_index = chunk_size
iteration = len(input_list) // chunk_size
chunked_list = input_list[start_index:end_index]
output_list = []
output_list.append(chunked_list)
if (len(input_list) % chunk_size != 0):
iteration += 1
for i in range(1, iteration):
start_index, end_index = end_index, chunk_size * (i + 1)
chunked_list = input_list[start_index:end_index]
output_list.append(chunked_list)
return output_list
def chunk_slice_push(input_list: List['T'], chunk_size: int) -> List['T']:
if chunk_size <= 0 or not isinstance(input_list, list):
return False
output_list = []
start_index = 0
while start_index < len(input_list):
chunked_list = input_list[start_index: int(start_index + chunk_size)]
output_list.append(chunked_list)
start_index += chunk_size
return output_list
if __name__ == '__main__':
# ---------------------------- TEST ---------------------------
DIVIDER_DASH = '-' * 50
GREEN_APPLE = '\U0001F34F'
RED_APPLE = '\U0001F34E'
test_input_lists = [
[10, 2, 3, 40, 5, 6, 7, 88, 9, 10],
[-1, -22, 34],
[-1.3, 2.54, math.pi, 4, '5'],
[math.pi, 2, math.pi, 4, 5, 'foo', 7, 8,
'bar', math.tau, math.inf, 12, math.e]
]
test_expected_outputs = [
[
[10, 2],
[3, 40],
[5, 6],
[7, 88],
[9, 10]
],
[
[-1],
[-22],
[34]
],
[
[-1.3, 2.54, math.pi],
[4, '5']
],
[
[math.pi, 2, math.pi, 4, 5],
['foo', 7, 8, 'bar', math.tau],
[math.inf, 12, math.e]
]
]
test_chunk_sizes = [2, 1, 3, 5]
count = 0
for test_input_list in test_input_lists:
test_output_form_chunk_slice = chunk_with_slice(
test_input_list, test_chunk_sizes[count])
test_output_form_chunk_slice_push = chunk_slice_push(
test_input_list, test_chunk_sizes[count])
print(DIVIDER_DASH)
if test_expected_outputs[count] == test_output_form_chunk_slice and test_expected_outputs[count] == test_output_form_chunk_slice_push:
print(f'{GREEN_APPLE} Test {int(count + 1)} was successful.')
else:
print(f'{RED_APPLE} Test {int(count + 1)} was successful.')
count += 1
JavaScript
function chunk_with_push(input_list, chunk_size) {
const output_list = [];
for (let element of input_list) {
const chunked_list = output_list[output_list.length - 1];
if (!chunked_list || chunked_list.length === chunk_size) {
output_list.push([element]);
} else {
chunked_list.push(element);
}
}
return output_list;
}
function chunk_with_slice(input_list, chunk_size) {
var start_index, end_index, chunked_list;
const output_list = [];
start_index = 0;
end_index = chunk_size;
chunked_list = input_list.slice(start_index, end_index);
output_list.push(chunked_list);
iteration = Math.floor(input_list.length / chunk_size);
if (input_list.length % chunk_size != 0) {
iteration += 1;
}
for (i = 1; i < iteration; i++) {
start_index = end_index;
end_index = chunk_size * (i + 1);
chunked_list = input_list.slice(start_index, end_index);
output_list.push(chunked_list);
}
return output_list;
}
function chunk_slice_push(input_list, chunk_size) {
const output_list = [];
let start_index = 0;
while (start_index < input_list.length) {
output_list.push(input_list.slice(start_index, parseInt(start_index + chunk_size)));
start_index += chunk_size;
}
return output_list;
}
// ---------------------------- TEST ---------------------------
divider_dash = '-----------------'
green_apple = 'π'
red_apple = 'π'
const test_input_lists = [
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[1, 2, 3],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
];
const test_expected_outputs = [
[
[1, 2],
[3, 4],
[5, 6],
[7, 8],
[9, 10]
],
[
[1],
[2],
[3]
],
[
[1, 2, 3],
[4, 5]
],
[
[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13]
]
];
const test_chunk_sizes = [2, 1, 3, 5];
var count = 0;
for (let test_input_list of test_input_lists) {
const test_output_form_chunk_push = chunk_with_push(test_input_list, test_chunk_sizes[count]);
const test_output_form_chunk_slice = chunk_with_slice(test_input_list, test_chunk_sizes[count]);
const test_output_form_chunk_slice_push = chunk_slice_push(test_input_list, test_chunk_sizes[count]);
console.log(divider_dash);
if (JSON.stringify(test_expected_outputs[count]) === JSON.stringify(test_output_form_chunk_push) &&
JSON.stringify(test_expected_outputs[count]) === JSON.stringify(test_output_form_chunk_slice) &&
JSON.stringify(test_expected_outputs[count]) === JSON.stringify(test_output_form_chunk_slice_push)
) {
console.log(green_apple + ' Test ' + parseInt(count + 1) + ' was successful.');
} else {
console.log(red_apple + ' Test ' + parseInt(count + 1) + ' was successful');
}
count++;
}
PHP
// Problem
// Given a list and chunk size, divide the list into sublists
// where each sublist is of chunk size
// --- Examples: (Input List, Chunk Size) => Output List
// ([1, 2, 3, 4], 2) => [[ 1, 2], [3, 4]]
// ([1, 2, 3, 4, 5], 2) => [[ 1, 2], [3, 4], [5]]
// ([1, 2, 3, 4, 5, 6, 7, 8], 3) => [[ 1, 2, 3], [4, 5, 6], [7, 8]]
// ([1, 2, 3, 4, 5], 4) => [[ 1, 2, 3, 4], [5]]
// ([1, 2, 3, 4, 5], 10) => [[ 1, 2, 3, 4, 5]]
//
function builtInChunk($input_list, $input_size)
{
if ($input_size <= 0 or !is_array($input_list)) {
return false;
}
return array_chunk($input_list, $input_size);
}
// ---------------------------- TEST ---------------------------
$divider_dash = PHP_EOL . '-----------------' . PHP_EOL;
$green_apple = 'π';
$red_apple = 'π';
$test_input_lists = [
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[1, 2, 3],
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],
];
$test_expected_outputs = [
[
[1, 2],
[3, 4],
[5, 6],
[7, 8],
[9, 10],
],
[
[1],
[2],
[3],
],
[
[1, 2, 3],
[4, 5],
],
[
[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13],
],
];
$test_chunk_sizes = [2, 1, 3, 5];
foreach ($test_input_lists as $key => $test_input_list) {
$test_output_form_chunk_slice = builtInChunk($test_input_list, $test_chunk_sizes[$key]);
print($divider_dash);
if ($test_expected_outputs[$key] === $test_output_form_chunk_slice) {
print($green_apple . ' Test ' . intval($key + 1) . ' was successful.');
} else {
print($red_apple . ' Test ' . intval($key + 1) . ' was successful');
}
}
Output
--------------------------------------------------
π Test 1 was successful.
--------------------------------------------------
π Test 2 was successful.
--------------------------------------------------
π Test 3 was successful.
--------------------------------------------------
π Test 4 was successful.
-
1\$\begingroup\$ Rather than posting redundant advice for the php snippet, apply AJNeufeld's advice about throwing and catching. An off-topic question: Why not merge this account with your Stack Overflow account? \$\endgroup\$mickmackusa– mickmackusa2019εΉ΄10ζ16ζ₯ 05:26:17 +00:00Commented Oct 16, 2019 at 5:26
3 Answers 3
Argument Checking and Duck-typing
Two problems with this code:
if chunk_size <= 0 or not isinstance(input_list, list):
return False
If the arguments are the correct type, the function returns a list of lists. If not, it returns False
? This is going to complicate the code which uses the function. It is better to raise an exception:
if chunk_size <= 0:
raise ValueError("Chunk size must be positive")
Secondly, you're requiring the input list to actually be a list. There are many Python types which can behave like a list, but are not instance of list
. Tuples and strings are two easy examples. It is often better to just accept the type provided, and try to execute the function with the data given. If the caller provides a something that looks like a duck and quacks like a duck, don't complain that a drake is not a duck.
Oddity
In this statement ...
chunked_list = input_list[start_index: int(start_index + chunk_size)]
... what is that int()
for? It seems unnecessary.
Return Type
You are breaking the list into multiple lists. Your return is not just a simple list; it is a list of lists. List[List[T]]
.
Itertools
Python comes with many tools builtin. Check out the itertools
modules. In particular, look at the grouper()
recipe. It reduces your "chunking" code into 2 statements.
>>> from itertools import zip_longest
>>>
>>> input_list= [10, 2, 3, 40, 5, 6, 7, 88, 9, 10, 11]
>>> chunk_size = 2
>>>
>>> args = [iter(input_list)] * chunk_size
>>> chunks = list(zip_longest(*args))
>>>
>>> chunks
[(10, 2), (3, 40), (5, 6), (7, 88), (9, 10), (11, None)]
>>>
As demonstrated above, it produces slightly different output. The last chunk is padded with None
values until it is the proper length. You could easily filter out these extra values.
>>> chunks[-1] = [item for item in chunks[-1] if item is not None]
>>> chunks
[(10, 2), (3, 40), (5, 6), (7, 88), (9, 10), [11]]
>>>
If your list can contain None
values, then a sentinel object could be used instead. Revised code:
from itertools import zip_longest
def chunk_itertools(input_list: List[T], chunk_size: int) -> List[List[T]]:
sentinel = object()
args = [iter(input_list)] * chunk_size
chunks = list(zip_longest(*args), fillvalue=sentinel)
chunks[-1] = [item for item in chunks[-1] if item is not sentinel]
return chunks
JavaScript & PHP reviews left to others
You have some odd stuff going on in chunk_with_slice
; specifically these lines:
def chunk_with_slice(input_list: List['T'], chunk_size: int) -> List['T']:
if chunk_size <= 0 or not isinstance(input_list, list):
return False
First, you're using strings to represent type vars (List['T']
). As mentioned in the comments, strings are used to reference types that can't be referred to directly, such as needing a forward reference, or referencing something in a separate file that can't be imported. You don't need a forward reference here though. I'd just use T
:
def chunk_with_slice(input_list: List[T], chunk_size: int) -> List[T]:
Second, think about those first two lines. You're basically saying "input_list
is a list. But if it's not a list, return false". If you say that input_list
is a list, you shouldn't be allowing other types in. You've told the type checker to help you catch type errors, so ideally you shouldn't also be trying to recover from bad data at runtime. If you expect that other types may be passed in, specify that using Union
(such as Union[List, str]
to allow a List
and str
in).
I also agree with @AJ's "It is often better to just accept the type provided, and try to execute the function with the data given." comment. I'd specify input_list
to be a Sequence
instead. A Sequence
is a container capable of indexing/slicing and len
. This covers lists, tuples, strings, and others; including user-defined types. Forcing your caller to use a specific container type is unnecessary in most circumstances.
-
\$\begingroup\$ PEP484 allows type-hints to be given as strings, to allow forward references. \$\endgroup\$AJNeufeld– AJNeufeld2019εΉ΄10ζ16ζ₯ 13:32:44 +00:00Commented Oct 16, 2019 at 13:32
-
\$\begingroup\$ @AJNeufeld Ahh, thanks. A forward reference isn't necessary here though. I'll update my answer. \$\endgroup\$Carcigenicate– Carcigenicate2019εΉ΄10ζ16ζ₯ 14:01:39 +00:00Commented Oct 16, 2019 at 14:01
For sequences (list, str, tuple, etc), this is rather straight forward:
def chunks(sequence, size):
return [sequence[index:index+size] for index in range(0, len(sequence), size)]
Explore related questions
See similar questions with these tags.