Discussion
I've been trying to increase my knowledge of Java Streams and for practice, I devised up the requirement of partitioning a Stream
of values into a Stream
of a List
of values.
Implementation
package data.structures.streams.utils;
import java.util.ArrayList;
import java.util.List;
import java.util.Spliterator;
import java.util.Spliterators;
import java.util.function.Consumer;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;
public class Partitioner {
static class PartitionSpliterator<T> extends Spliterators.AbstractSpliterator<List<T>> {
private final Spliterator<T> values;
private final int partitionSize;
private List<T> currentPartition;
public PartitionSpliterator(final Spliterator<T> values, final int partitionSize) {
super(
Double.valueOf(Math.ceil((double) (values.estimateSize() / partitionSize)))
.intValue(),
ORDERED | IMMUTABLE | SIZED | SUBSIZED
);
if (1 > partitionSize) {
throw new IllegalArgumentException("Size must be a positive integer");
}
this.values = values;
this.partitionSize = partitionSize;
this.currentPartition = null;
}
@Override
public boolean tryAdvance(Consumer<? super List<T>> action) {
if (null == currentPartition) {
currentPartition = new ArrayList<>();
}
while (partitionSize > currentPartition.size() && values.tryAdvance(currentPartition::add)) {
}
if (currentPartition.isEmpty()) {
return false;
}
action.accept(currentPartition);
currentPartition = null;
return true;
}
}
public static <Value> Stream<List<Value>> partition(final Stream<Value> values, final int size) {
if (1 > size) {
throw new IllegalArgumentException("Size must be a positive integer");
}
return StreamSupport.stream(
new PartitionSpliterator<>(values.spliterator(), size),
values.isParallel()
);
}
}
1 Answer 1
Avoid naming generic types in such a way as they look like classes. This is confusing to readers of the code. The official convention would suggest V
rather than Value
. Other answers to that question provide other conventions.
The code has every comparison between two values done in the opposite way as the vast majority of programmers would expect. This makes the code harder to read.
- When comparing a constant to a variable, by convention the variable is first.
- When comparing two variables, if one is iterated upwards and one is the fixed upper bound, by convention the iterated value is first.
- When comparing a variable to
null
, by convention the variable is first.
currentPartition
is scoped incorrectly. It is only used in tryAdvance
, and does not persist beyond a single call. It should be moved into that method. There is also no value in setting it to null
.
The while
loop in tryAdvance
is technically correct, but it is harder on readers to understand the important bit is the side effect of the tryAdvance
call. Strongly consider rewriting it to just repeatedly call tryAdvance
. This is very easy to read and should be highly performant. If, later, performance testing shows this is a bottleneck (it won't), put the tryAdvance
call into the while
block and break if the boolean comes back false.
Since you know the size of the partition, you can presize the ArrayList
. This will usually not matter, but in some cases it might be somewhat more performant, and there's no readability impact. Note that you may still see one resize depending on the JDK's implementation.
PartitionSpliterator
's size check calls the variable size
, but the parameter name is partitionSize
.
PartitionSpliterator
should be private and final.
By convention, public members appear before private members in a class, and also methods appear before nested classes.
Partitioner
is not designed for extension and should be final.
Partitioner
is not designed to be instantiated, and should have a private
constructor to prevent instantiation. (By convention, constructors appear before other methods, even if the constructor is private and the method is public).
The first argument (est
) to super
in PartitionSpliterator()
is very hard to read and also incorrect. If the passed-in spliterator returns MAX_VALUE, you will assign a very large and probably very bad estimate. Either use MAX_VALUE always here, or add a private static method which correctly computes the estimate.
Then note that it only needs an estimated value, and being off by 1 is fine. Just divide the sizes and add 1. This will be right in every case except where they divide evenly, in which case it will be 1 too large.
If you made all these changes, your code might look more like:
public final class Partitioner {
private Partitioner() {}
public static <V> Stream<List<V>> partition(final Stream<V> values, final int size) {
if (size < 1) {
throw new IllegalArgumentException("Size must be a positive integer");
}
return StreamSupport.stream(
new PartitionSpliterator<>(values.spliterator(), size),
values.isParallel()
);
}
private static final class PartitionSpliterator<T> extends Spliterators.AbstractSpliterator<List<T>> {
private final Spliterator<T> values;
private final int partitionSize;
public PartitionSpliterator(final Spliterator<T> values, final int partitionSize) {
super(computeEstimatedSize(values.estimateSize(), partitionSize),
ORDERED | IMMUTABLE | SIZED | SUBSIZED
);
if (partitionSize < 1) {
throw new IllegalArgumentException("Size must be a positive integer");
}
this.values = values;
this.partitionSize = partitionSize;
}
@Override
public boolean tryAdvance(Consumer<? super List<T>> action) {
List<T> currentPartition = new ArrayList<>(partitionSize);
while (currentPartition.size() < partitionSize) {
values.tryAdvance(currentPartition::add);
/*
boolean addedValue = values.tryAdvance(currentPartition::add));
if (!addedValue) {
break;
}
*/
}
if (currentPartition.isEmpty()) {
return false;
}
action.accept(currentPartition);
return true;
}
private static long computeEstimatedSize(long estimatedValuesSize, int partitionSize) {
if (estimatedValuesSize == Long.MAX_VALUE) {
return Long.MAX_VALUE;
}
return (estimatedValuesSize / partitionSize) + 1;
}
}
}
-
\$\begingroup\$ Ah, you either need the
break
approach in thetryAdvance
while
loop, or you need to change to a for loop from 0 to partitionSize. It was late, sorry. :( \$\endgroup\$Eric Stein– Eric Stein2022年04月08日 16:58:44 +00:00Commented Apr 8, 2022 at 16:58
IntStream
approach? \$\endgroup\$split()
, and a basicIntStream
should be able. \$\endgroup\$split()
? I definitely agree that it looks like this is a solved problem - again, my attempt at implementing this solved problem was to learn more aboutStream
s, in general. \$\endgroup\$