I have a simple list of results, and I need to return only one value depending on content, for example:
public static void main(String[] args) {
List<String> results = new ArrayList<>();
results.add("ERROR");
results.add("ERROR");
results.add("OK");
System.out.println(search(results));
}
public static String search(List src) {
if (searchValueInStream(src, "ERROR")) {
return "ERROR";
} else if (searchValueInStream(src, "OK")) {
return "OK";
}
return "NA";
}
public static boolean searchValueInStream(List src, String str) {
return src.stream().anyMatch(v -> v.equals(str));
}
This code works fine, but it does not look good, and I want to use it in same stream like:
String result = results.stream()
.someFilter(v->v.equals(ERROR)? return "ERROR")
.someFilter(v->v.equals(OK)? return "OK")
.orElse("NA");
without "if then else" statement. Is this possible?
6 Answers 6
There is absolutely no point in using streams for this or integers to represent the values. Don't complicate your life with streams just because you think they're cool. Streams are a hammer that make all your problems look like nails. Don't use streams unless they make sense. Don't try to prepare for more response types than you have (which is why people suggest the integers) unless you actually know you are going to need them.
Keep it simple and you will be able to maintain the code in the future.
boolean okFound = false;
for (String result: results) {
if (ERROR.equals(result) {
return ERROR;
}
if (OK.equals(result)) {
okFound = true;
}
}
return okFound ? OK : UNKNOWN;
There is nothing wrong with writing a plain old for loop in 2024. They are still a completely valid programming structure. Don't let the fashionable new tools cloud your vision.
This loop will also out-perform any stream based solution. Every dot in a stream is an object creation. Every -> is an additional method invocation. Every autoboxed Integer is a throwaway object creation. All of these cost memory and processor time. Once you start working with large data, you will have to pay attention what the nice and fancy libraries actually do under the covers.
This for-loop will short circuit on error, so it's worst case scenario is O(N).
Other people have brought up the possibility of assigning a numerical priority to each possible value. I think this is an intuitive approach.
One simple way to find the most "important" value in a stream, given we can define "importance" as an ordering, is collecting the stream using Collectors.maxBy
(or minBy
I suppose - in this case it's pretty arbitrary).
For deciding what the order is, we can either write a comparison function of our own, or we can do the simpler thing by using Comparator.comparingInt
. Since all possible messages are distinct and known in advance, the int
s we compare can simply be kept in a Map
.
With all that, we might end up writing something like:
private static final Map<String, Integer> OK_ERROR_PRIORITIES = Map.of("OK", 1, "ERROR", 2);
public static String search(Collection<? extends String> src) {
return search(src, OK_ERROR_PRIORITIES, "NA");
}
public static T search(Collection<? extends T> src, Map<? super T, Integer> priorities, T defaultValue) {
Collector<T, ?, Optional<T>> highestPriority = Collectors.maxBy(Comparator.comparingInt(priorities::get));
Optional<T> result = src.stream()
.filter(priorities::containsKey)
.collect(highestPriority);
return result.orElse(defaultValue);
}
You can convert every value into an integer, with higher integers taking priority over lower integers. Then call max
and map the highest integer to its value. I went with ERROR = 1, OK = 0, NA = -1
. Note that List.indexOf
conveniently returns -1
if no match is found, which is why I chose to have higher integers have higher priority (rather than higher integers having lower priority).
// note: ordered in ascending priority
// which means last element has highest priority
public static final List<String> codes = Arrays.asList("OK", "ERROR");
public static int codeToInt(Object code) {
return codes.indexOf(code); // -1 means NA
}
public static String intToCode(int i) {
if (i < 0) // -1 means NA
return "NA";
return codes.get(i);
}
public static String search(List src) {
// note: if the class is Test, use Test::code_to_int
int i = src.stream().mapToInt(Test::codeToInt).max().orElse(-1);
return intToCode(i);
}
With this approach, the number of passes over the stream does not increase with the number of distinct values. This would improve efficiency if you make codes
an unordered map for O(1)
indexing. If you do so, consider changing the value type into something with faster hashing.
In cases where the highest-priority code appears early, this code is inefficient in that max
will not return early. Since worst-case scenarios require you to search the whole stream (e.g. all values are OK
until the last value which is ERROR
), I think this is workable.
With respect to style, I prefer to declare intToCode
and codeToInt
separately, but you could use lambda functions if you like. Hope this helps - I appreciate your question clarity!
-
1\$\begingroup\$ Thanks for the catch. I made the appropriate edits to replace
codes[i]
withcodes.get(i)
. Also fixed some other bugs:map
tomapToInt
,orelse
toorElse
, and generalized toObject
streams. \$\endgroup\$Justin Chang– Justin Chang2024年08月10日 00:08:30 +00:00Commented Aug 10, 2024 at 0:08 -
1\$\begingroup\$ This seems overly complicated. It would be much more readable and easy to understand if you loop over the results once, maintain a boolean flag for "OK was found" and return immediately if "ERROR" occurs. No need to make a detour via integers. I feel like you went this route because you expect there to be more values than just ok/error? No need to complicate the code when the need was not specified. \$\endgroup\$TorbenPutkonen– TorbenPutkonen2024年08月13日 05:54:07 +00:00Commented Aug 13, 2024 at 5:54
-
\$\begingroup\$ Your method
search()
is not type-safe due to raw-type use and will raise a compilation warning. Also, usingjava.lang.Object
as a parameter type incodeToInt()
isn't a good idea, because calling this method by passing anything but aString
is likely a mistake and should result in a compile time error instead of being hidden. \$\endgroup\$Alexander Ivanchenko– Alexander Ivanchenko2024年10月13日 04:52:00 +00:00Commented Oct 13, 2024 at 4:52 -
\$\begingroup\$ "would improve efficiency if you make codes an unordered map for O(1) indexing" - if you really think so, take a look at the implementations of
ArrayList
andHashMap
and figure out how many hops over the Heap addresses will happen in the worst case scenario with a two-element ArrayList and with a two-entry HashMap. Using a Map will have no impact on performance in this case, but it'll improve readability of the code (see the answer by Sara J). \$\endgroup\$Alexander Ivanchenko– Alexander Ivanchenko2024年10月13日 05:02:09 +00:00Commented Oct 13, 2024 at 5:02
A simple and short definition of search makes use of List.indexOf
from a list of the three states you want to return:
public static String search(List<String> src) {
List<String> vals = List.of("NA", "OK", "ERROR");
return vals.get(Math.max(0, src.stream().mapToInt(vals::indexOf).max().orElse(0)));
}
The stream over src items maps to a matching value in list ["NA", "OK", "ERROR"], and then returns the highest matching value.
Note that this does one scan through the list and doesn't bail out of the search on first discovery of "ERROR" - so a regular for loop may be better on huge lists, or try ".parallelStream()" in place of "stream()".
I might have misunderstood what the code of @zond should produce, but the principle stays the same. So this code might reverse conditions (returns "OK" first and then "ERROR" instead of the other way round).
if (searchValueInStream(src, "ERROR")) {
return "ERROR";
} else if (searchValueInStream(src, "OK")) {
return "OK";
}
return "NA";
The problem with your code is that you're running through the list twice, once for the "ERROR" search, once for the "OK" search. This can be a costly operation (though, it most likely is not).
I'm not on good terms with streams (I believe they are overhyped and rarely reduce complexity/cognitive load) so I can only provide you with an example without them. What you want is to loop once through the list and keep state for this time. The most simple solution would be something like this:
boolean errorEncountered = false;
for (String item : items) {
if (item.equals("OK")) {
return "OK";
} else if (item.equals("ERROR")) {
errorEncountered = true;
}
}
if (errorEncountered) {
return "ERROR";
} else {
return "NA";
}
In the best case the loop exits at the first item, as it is "OK", the worst case is that it loops through the whole list once.
However, this approach has two problems:
- If you want the
item
itself, you need to store that. - With more items this becomes quite cumbersome.
The first point is rather easily solved:
Item firstEncounteredErrorItem = null;
for (Item item : items) {
if (item.getState().equals("OK")) {
return item;
} else if (firstEncounteredErrorItem == null && item.getState().equals("ERROR")) {
firstEncounteredErrorItem = item;
}
}
if (firstEncounteredErrorItem != null) {
return firstEncounteredErrorItem;
} else {
return Item.NA;
}
A little bit more complex, but still quite manageable.
Now, the second point, extensibility, is harder to solve, as we may need to "stack" differently prioritized items. For that, we'd require something like "Tuple", which we most likely need to write ourselves. The basic idea would be keeping a List
of encountered items which match their "priorization".
List<String> prioritziedItemStates = Item.PRIORITIZED_ITEM_STATES;
Item[] encounteredItems = new Item[Item.MAX_VALUE_COUNT];
for (Item item : items) {
if (item.getState().equals("OK")) {
return item;
} else {
int itemPrioritizedIndex = prioritziedItemStates.indexOf(item.getState());
if (encounteredItems[itemPrioritizedIndex] == null) {
encounteredItems[itemPrioritizedIndex] = item;
}
}
}
for (Item encounteredItem : encounteredItems) {
if (encounteredItem != null) {
return encounteredItem;
}
}
return Item.NA;
This has still the upside that it runs through the list only once, but does a lookup for every item in another (short?) list that is not "OK". Whether that trade is worth it or not is depending on how large the original list is and how many states you need to keep.
String result = results.stream()
.someFilter(v->v.equals(ERROR)? return "ERROR")
.someFilter(v->v.equals(OK)? return "OK")
.orElse("NA");
I'm not sure that's possible with streams, at least not without wrapping them multiple times. There is findFirst
but that terminates the stream, but you want to keep going. Basically what you want would be something like this:
String result = results.stream()
.takeFirstOrElseContinue(result -> result.equals("OK"))
.takeFirstOrElseContinue(result -> result.equals("ERROR"))
.orElse("NA");
I'm not aware that streams would work like that.
There's already a considerable number of answers, but none of them offers a truly declarative solution to the problem despite the use of functional programming features.
The code that follows declarative programming paradigm outlines the high-level logic without explicitly specifying the underlying algorithms or the concrete steps of the execution flow. In other words, declarative code focuses on what it should accomplish, rather than how to accomplish it.
And this problem is not about finding the max
or min
.
Sara J has already pointed out in their answer that decision to consider "OK" to be greater than "ERROR" is absolutely arbitrary
In this case, Stream.max()
does not reveal the real intent of the computation. The essence of declarative programming is that code should describe the problem, but instead max()
dictates how to solve it. That's an arbitrary low-level instruction unrelated to the actual problem.
Applying Stream.max()
to this problem produces an imperative implementation, despite the use of functional programming constructs.
Surprise! On its own, the usage of Streams and Lambdas doesn't automatically make the code declarative. It matters how these functional constructs are used (similarly, code doesn't automatically become object-oriented, just because we're using an object-oriented language). As Venkat Subramaniam said, functional code should read like a problem statement.
For that reason, a plane for
-loop shown in the answer by Torben Putkonen bears less cognitive load for the code reader than stream-based solutions provided so far. But I don't share Torben's sentiment that streams are not the right tool for this problem.
But before I elaborate on that, there's a couple of quick things to address.
Code should Communicate its Intention
Decades ago, the father of TDD and Extreme Programming Kent Beck outlined the criteria of Simple Design, one of which code should be communicative (Martin Fowler refers to it as code "reveals intention" in this article).
Communicative
Every idea that needs to be communicated is represented in the system. Like words in a vocabulary, the elements of the system communicate to future readers.
A quote from "Extreme Programming Explained: Embrace Change", 2nd Edition. Emphasis added
The names of code should clearly communicate their responsibilities to the code reader.
Your method name search()
is misleading because this method doesn't actually perform a search, but rather aggregates the results (you just happen to implement it as a sequence of searches).
Consider such names as combine()
or aggregate()
instead.
Avoid writing Stringly-typed code
Never use plain strings to represent concepts in your business logic when working with a strongly typed language. It's error-prone, impedes refactoring, and you're robbing yourself because strings can offer behavior that will be at your disposal if you use proper types.
I suppose, each "OK" and "ERROR" describes an execution outcome of some sort, but it will be better if you provided some context on where these strings are coming from, and what exactly do they represent. Because it's crucial for modeling a type to substitute these strings, and also important in order to make a conclusion on whether you're doing the right thing to begin with.
Since you gave no context regarding these string, I'll opt for the simplest option and define an enum
:
public enum CompletionStatus { ERROR, OK }
One might ask what about "NA", why it's not represented in this enum
? The answer: it should not be there because in original code it doesn't describe data, but rather the absence of data.
And when it comes to indicating that data is not present, we have three options.
Return
null
. Although, it's a nasty way of telling the caller that there's no data it was requesting, this option is still used because we have lots of old contracts.Use
Optional
. An empty option is a functional way of representing the absence of data.Throw an exception if a result is guaranteed to be present and there's no way to proceed with the normal execution flow.
In this case, I'll choose Optional
.
Refactoring using Fold operation
First, let's discuss how results are combined in the original code. In a nutshell, aggregation logic is similar to how logical AND operator works:
When all results are successful, the aggregated result is success.
When there's at least one error, the aggregated result is error.
Now, to solve this problem in a declarative way, we can make use of the idiomatic functional operation fold left, which is represented by the reduce()
method in the Stream API.
public static Optional<CompletionStatus> combineAll(List<CompletionStatus> results) {
return results.stream()
.reduce((acc, next) -> next == OK ? OK : ERROR);
}
That's it.
This method clearly communicates its intention. It's succinct and fairly easy to comprehend.
If we assume that there's only one meaningfull way to aggregate the results (if it's not the case then use the solution shown above, as I said there should be more context regarding the purpose of these OK and ERROR), then it deserves to be described as behavior specific to CompletionStatus
type. And we can go one step further and take advantage of the fact that Java enum
is a specialized type of class
, and thus can have behavior.
public static Optional<CompletionStatus> combineAll(Collection<CompletionStatus> results) {
return results.stream()
.reduce(CompletionStatus::combine);
}
It's also a good practice to relax the parameter types. In this case, since we're only traversing results
using a stream, it doesn't need to be a List
, this code will work with any Collection
Enum:
public enum CompletionStatus {
ERROR, OK;
public CompletionStatus combine(CompletionStatus other) {
return switch (other) {
case ERROR -> other;
case OK -> this;
};
}
}
Short-circuiting
It's also possible to add short-circuiting to this implementation, so that the stream will terminate after encountering the first ERROR
, instead of processing all elements.
This can be done either by employing third-party tools, or by using the standard Stream API alone. The latter is a bit more complicated with the current LTS version (JDK 21 at the time of writing).
So, I'll show how to achieve this with StreamEx:
return StreamEx.of(results)
.takeWhileInclusive(s -> s == OK) // take all OK elements until the first ERROR + the ERROR element
.reduce(CompletionStatus::combine);
However, I wouldn't recommend complicating the code with this change unless you're certain it provides some tangible benefit. Read the next section on performance to understand what I mean.
Performance
A couple of answers were mentioning performance, especially Torben was very vocal saying:
loop will also out-perform any stream based solution. Every dot in a stream is an object creation.
That's a bit overexaggerated, because:
JIT-compiler optimizes streams quite well performing inlining, reordering, redundancy elimination, etc. JIT is a beating heart of the JVM, compiled code might drastically differ from the code produced by the interpreter. For example, after profiling the method with enhanced
for
-loop from the Torben's answer JIT will throw away anIterator
that it uses under the hood so that it will perform like an indexed-based loop. The point is that we don't need to sacrifice readability implementing manually optimizations which JIT-compiler does.Stream API was designed as a tool for writing expressive declarative code, boosting performance wasn't the goal of introducing streams into JDK (parallel streams is a niche case, they are not used very often).
In real-life applications, only a tiny fraction of code is performance critical. Maintainability is a way more important concern than performance for the bulk of the code base. We want to make it easier delivering new features, and we want our code to be less prone to bugs. For that reason, clean and simple implementations are more valuable than marginal performance gains.
As the rule of thumb, from two solutions with comparatively similar performance, you should choose the one which is cleaner and simpler.
Here's what Donald Knuth, a legend of Computer Science who studded algorithms and performance all his life, said about premature optimization:
Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.
In order to determine these critical 3% of the code, we profile applications in order to identify performance bottlenecks. Instead of prematurely optimizing everything.
Today, Java is predominantly used today for writing backend services. Normally, the amount of data associated with each request is limited, and collections are very likely to be small or at least moderate in size. We, usually, avoid fetching humungous collections from the database (that's where filtering, grouping and aggregation of massive amounts of data should happen). A client request is unlikely to bring you an enormous collection either (if you allow your clients sending massive payloads which translate into tons of objects in your service, you should probably learn about Denial-of-service attack).
But, please, don't interpret my words as "use Stream everywhere". It's worth emphasising that an ettempt to replace each and every loop with a stream is not a bright idea. Loops and streams are tools with different semantics. By design you might need to perform mutation of elements or side-effects, which is not the proper settings for using streams.
And from the practical standpoint, if you already have a clean, easy to comprehand implementation, then there's little value in trying to make it perfect. Improving tests and implementing new features is better investment of time.
In my opinion Torben's implementation and even OP's version are good to go (if OP will fix the issues with stringly-typing and missleading names)
Avoid using Raw Types
Never drop generic types for no good reason. This safety net can catch at compile time bugs that otherwise might lead to ClassCastException
or misbehavior of the application at runtime.
Your method search()
is happy to accept a list of any type because you're using a raw type list as a parameter:
public static String search(List src)
You can pass a list of completely unrelated type, such as a List<BigDecimal>
, or a List<Cat>
. The compiler will allow these calls and at runtime your implementation will return "NA"
.
But, normally, instead we prefer the compiler to produce a compilation error when a method is invoked with a data type which is different from the one the method is intended to operate with.
-
\$\begingroup\$ An excellent answer! This was the first time I have seen an enum used as a BinaryOperator. The reduce method has a fairly limited use in everyday programming compared to the other stream operations so I'm not surprised nobody (me included) before you thought of it. I should point out that the CompletionStatus::combine is in contradiction to the Fowler/Beck quotes. The name describes a generic combining algorithm and the reader has to look at it's implementation to figure out what it does. \$\endgroup\$TorbenPutkonen– TorbenPutkonen2024年10月14日 11:00:40 +00:00Commented Oct 14, 2024 at 11:00
-
1\$\begingroup\$ @TorbenPutkonen Thanks, Torben. In my opinion, the way the question title is framed steers the reader towards thinking in terms of min and max. Regarding
CompletionStatus::combine
, the method namecombine()
communicates the idea of producing a single result by combining two statuses, but sure it doesn't tell how exactly it's done. I made an assumption that there's only one meaningful way of aggregating results, hence should be described as behavior specific toCompletionStatus
. \$\endgroup\$Alexander Ivanchenko– Alexander Ivanchenko2024年10月14日 16:45:14 +00:00Commented Oct 14, 2024 at 16:45 -
1\$\begingroup\$ @TorbenPutkonen It's like
Either.merge()
in Scala, without knowing the semantics of the type, method name is not completely clear. If this assumption is incorrect, the method has no sense, I made an edit to clarify that (note that OP left us in the dark by giving no context regarding these strings, andCompletionStatus
is a hypothetical beast invented). And I expect that the code reader should have an understanding of whatCompletionStatus
is, when examining a method returning this type. \$\endgroup\$Alexander Ivanchenko– Alexander Ivanchenko2024年10月14日 16:46:10 +00:00Commented Oct 14, 2024 at 16:46