Search values by priority in a Stream

Question 1

I have a simple list of results, and I need to return only one value depending on content, for example:

public static void main(String[] args) {
 List<String> results = new ArrayList<>();
 results.add("ERROR");
 results.add("ERROR");
 results.add("OK");
 System.out.println(search(results));
}
public static String search(List src) {
 if (searchValueInStream(src, "ERROR")) {
 return "ERROR";
 } else if (searchValueInStream(src, "OK")) {
 return "OK";
 }
 return "NA";
}
public static boolean searchValueInStream(List src, String str) {
 return src.stream().anyMatch(v -> v.equals(str));
}

This code works fine, but it does not look good, and I want to use it in same stream like:

String result = results.stream()
 .someFilter(v->v.equals(ERROR)? return "ERROR")
 .someFilter(v->v.equals(OK)? return "OK")
 .orElse("NA");

without "if then else" statement. Is this possible?

Question 2

There is absolutely no point in using streams for this or integers to represent the values. Don't complicate your life with streams just because you think they're cool. Streams are a hammer that make all your problems look like nails. Don't use streams unless they make sense. Don't try to prepare for more response types than you have (which is why people suggest the integers) unless you actually know you are going to need them.

Keep it simple and you will be able to maintain the code in the future.

boolean okFound = false;
for (String result: results) {
 if (ERROR.equals(result) {
 return ERROR;
 }
 if (OK.equals(result)) {
 okFound = true;
 }
}
return okFound ? OK : UNKNOWN;

There is nothing wrong with writing a plain old for loop in 2024. They are still a completely valid programming structure. Don't let the fashionable new tools cloud your vision.

This loop will also out-perform any stream based solution. Every dot in a stream is an object creation. Every -> is an additional method invocation. Every autoboxed Integer is a throwaway object creation. All of these cost memory and processor time. Once you start working with large data, you will have to pay attention what the nice and fancy libraries actually do under the covers.

This for-loop will short circuit on error, so it's worst case scenario is O(N).

Question 3

Other people have brought up the possibility of assigning a numerical priority to each possible value. I think this is an intuitive approach.

One simple way to find the most "important" value in a stream, given we can define "importance" as an ordering, is collecting the stream using Collectors.maxBy (or minBy I suppose - in this case it's pretty arbitrary).

For deciding what the order is, we can either write a comparison function of our own, or we can do the simpler thing by using Comparator.comparingInt. Since all possible messages are distinct and known in advance, the ints we compare can simply be kept in a Map.

With all that, we might end up writing something like:

private static final Map<String, Integer> OK_ERROR_PRIORITIES = Map.of("OK", 1, "ERROR", 2);
public static String search(Collection<? extends String> src) {
 return search(src, OK_ERROR_PRIORITIES, "NA");
}
public static T search(Collection<? extends T> src, Map<? super T, Integer> priorities, T defaultValue) {
 Collector<T, ?, Optional<T>> highestPriority = Collectors.maxBy(Comparator.comparingInt(priorities::get));
 Optional<T> result = src.stream()
 .filter(priorities::containsKey)
 .collect(highestPriority);
 return result.orElse(defaultValue);
}

Question 4

You can convert every value into an integer, with higher integers taking priority over lower integers. Then call max and map the highest integer to its value. I went with ERROR = 1, OK = 0, NA = -1. Note that List.indexOf conveniently returns -1 if no match is found, which is why I chose to have higher integers have higher priority (rather than higher integers having lower priority).

// note: ordered in ascending priority
// which means last element has highest priority
public static final List<String> codes = Arrays.asList("OK", "ERROR");
public static int codeToInt(Object code) {
 return codes.indexOf(code); // -1 means NA
}
public static String intToCode(int i) {
 if (i < 0) // -1 means NA
 return "NA";
 return codes.get(i);
}
public static String search(List src) {
 // note: if the class is Test, use Test::code_to_int
 int i = src.stream().mapToInt(Test::codeToInt).max().orElse(-1);
 return intToCode(i);
}

With this approach, the number of passes over the stream does not increase with the number of distinct values. This would improve efficiency if you make codes an unordered map for O(1) indexing. If you do so, consider changing the value type into something with faster hashing.

In cases where the highest-priority code appears early, this code is inefficient in that max will not return early. Since worst-case scenarios require you to search the whole stream (e.g. all values are OK until the last value which is ERROR), I think this is workable.

With respect to style, I prefer to declare intToCode and codeToInt separately, but you could use lambda functions if you like. Hope this helps - I appreciate your question clarity!

Question 5

Thanks for the catch. I made the appropriate edits to replace codes[i] with codes.get(i). Also fixed some other bugs: map to mapToInt, orelse to orElse, and generalized to Object streams.

Question 6

This seems overly complicated. It would be much more readable and easy to understand if you loop over the results once, maintain a boolean flag for "OK was found" and return immediately if "ERROR" occurs. No need to make a detour via integers. I feel like you went this route because you expect there to be more values than just ok/error? No need to complicate the code when the need was not specified.

Question 7

Your method search() is not type-safe due to raw-type use and will raise a compilation warning. Also, using java.lang.Object as a parameter type in codeToInt() isn't a good idea, because calling this method by passing anything but a String is likely a mistake and should result in a compile time error instead of being hidden.

Question 8

"would improve efficiency if you make codes an unordered map for O(1) indexing" - if you really think so, take a look at the implementations of ArrayList and HashMap and figure out how many hops over the Heap addresses will happen in the worst case scenario with a two-element ArrayList and with a two-entry HashMap. Using a Map will have no impact on performance in this case, but it'll improve readability of the code (see the answer by Sara J).

Question 9

A simple and short definition of search makes use of List.indexOf from a list of the three states you want to return:

public static String search(List<String> src) {
 List<String> vals = List.of("NA", "OK", "ERROR");
 return vals.get(Math.max(0, src.stream().mapToInt(vals::indexOf).max().orElse(0)));
}

The stream over src items maps to a matching value in list ["NA", "OK", "ERROR"], and then returns the highest matching value.

Note that this does one scan through the list and doesn't bail out of the search on first discovery of "ERROR" - so a regular for loop may be better on huge lists, or try ".parallelStream()" in place of "stream()".

Question 10

I might have misunderstood what the code of @zond should produce, but the principle stays the same. So this code might reverse conditions (returns "OK" first and then "ERROR" instead of the other way round).

 if (searchValueInStream(src, "ERROR")) {
 return "ERROR";
 } else if (searchValueInStream(src, "OK")) {
 return "OK";
 }
 return "NA";

The problem with your code is that you're running through the list twice, once for the "ERROR" search, once for the "OK" search. This can be a costly operation (though, it most likely is not).

I'm not on good terms with streams (I believe they are overhyped and rarely reduce complexity/cognitive load) so I can only provide you with an example without them. What you want is to loop once through the list and keep state for this time. The most simple solution would be something like this:

boolean errorEncountered = false;
for (String item : items) {
 if (item.equals("OK")) {
 return "OK";
 } else if (item.equals("ERROR")) {
 errorEncountered = true;
 }
}
if (errorEncountered) {
 return "ERROR";
} else {
 return "NA";
}

In the best case the loop exits at the first item, as it is "OK", the worst case is that it loops through the whole list once.

However, this approach has two problems:

If you want the item itself, you need to store that.
With more items this becomes quite cumbersome.

The first point is rather easily solved:

Item firstEncounteredErrorItem = null;
for (Item item : items) {
 if (item.getState().equals("OK")) {
 return item;
 } else if (firstEncounteredErrorItem == null && item.getState().equals("ERROR")) {
 firstEncounteredErrorItem = item;
 }
}
if (firstEncounteredErrorItem != null) {
 return firstEncounteredErrorItem;
} else {
 return Item.NA;
}

A little bit more complex, but still quite manageable.

Now, the second point, extensibility, is harder to solve, as we may need to "stack" differently prioritized items. For that, we'd require something like "Tuple", which we most likely need to write ourselves. The basic idea would be keeping a List of encountered items which match their "priorization".

List<String> prioritziedItemStates = Item.PRIORITIZED_ITEM_STATES;
Item[] encounteredItems = new Item[Item.MAX_VALUE_COUNT];
for (Item item : items) {
 if (item.getState().equals("OK")) {
 return item;
 } else {
 int itemPrioritizedIndex = prioritziedItemStates.indexOf(item.getState());
 
 if (encounteredItems[itemPrioritizedIndex] == null) {
 encounteredItems[itemPrioritizedIndex] = item;
 }
 }
}
for (Item encounteredItem : encounteredItems) {
 if (encounteredItem != null) {
 return encounteredItem;
 }
}
return Item.NA;

This has still the upside that it runs through the list only once, but does a lookup for every item in another (short?) list that is not "OK". Whether that trade is worth it or not is depending on how large the original list is and how many states you need to keep.

String result = results.stream()
 .someFilter(v->v.equals(ERROR)? return "ERROR")
 .someFilter(v->v.equals(OK)? return "OK")
 .orElse("NA");

I'm not sure that's possible with streams, at least not without wrapping them multiple times. There is findFirst but that terminates the stream, but you want to keep going. Basically what you want would be something like this:

String result = results.stream()
 .takeFirstOrElseContinue(result -> result.equals("OK"))
 .takeFirstOrElseContinue(result -> result.equals("ERROR"))
 .orElse("NA");

I'm not aware that streams would work like that.

Question 11

There's already a considerable number of answers, but none of them offers a truly declarative solution to the problem despite the use of functional programming features.

The code that follows declarative programming paradigm outlines the high-level logic without explicitly specifying the underlying algorithms or the concrete steps of the execution flow. In other words, declarative code focuses on what it should accomplish, rather than how to accomplish it.

And this problem is not about finding the max or min.

^{Sara J has already pointed out in their answer that decision to consider "OK" to be greater than "ERROR" is absolutely arbitrary}

In this case, Stream.max() does not reveal the real intent of the computation. The essence of declarative programming is that code should describe the problem, but instead max() dictates how to solve it. That's an arbitrary low-level instruction unrelated to the actual problem.

Applying Stream.max() to this problem produces an imperative implementation, despite the use of functional programming constructs.

^{Surprise! On its own, the usage of Streams and Lambdas doesn't automatically make the code declarative. It matters how these functional constructs are used (similarly, code doesn't automatically become object-oriented, just because we're using an object-oriented language). As Venkat Subramaniam said, functional code should read like a problem statement.}

For that reason, a plane for-loop shown in the answer by Torben Putkonen bears less cognitive load for the code reader than stream-based solutions provided so far. But I don't share Torben's sentiment that streams are not the right tool for this problem.

But before I elaborate on that, there's a couple of quick things to address.

Code should Communicate its Intention

Decades ago, the father of TDD and Extreme Programming Kent Beck outlined the criteria of Simple Design, one of which code should be communicative (Martin Fowler refers to it as code "reveals intention" in this article).

Communicative

Every idea that needs to be communicated is represented in the system. Like words in a vocabulary, the elements of the system communicate to future readers.

^{A quote from "Extreme Programming Explained: Embrace Change", 2nd Edition. Emphasis added}

The names of code should clearly communicate their responsibilities to the code reader.

Your method name search() is misleading because this method doesn't actually perform a search, but rather aggregates the results (you just happen to implement it as a sequence of searches).

Consider such names as combine() or aggregate() instead.

Avoid writing Stringly-typed code

Never use plain strings to represent concepts in your business logic when working with a strongly typed language. It's error-prone, impedes refactoring, and you're robbing yourself because strings can offer behavior that will be at your disposal if you use proper types.

^{I suppose, each "OK" and "ERROR" describes an execution outcome of some sort, but it will be better if you provided some context on where these strings are coming from, and what exactly do they represent. Because it's crucial for modeling a type to substitute these strings, and also important in order to make a conclusion on whether you're doing the right thing to begin with.}

Since you gave no context regarding these string, I'll opt for the simplest option and define an enum:

public enum CompletionStatus { ERROR, OK }

One might ask what about "NA", why it's not represented in this enum? The answer: it should not be there because in original code it doesn't describe data, but rather the absence of data.

And when it comes to indicating that data is not present, we have three options.

Return null. Although, it's a nasty way of telling the caller that there's no data it was requesting, this option is still used because we have lots of old contracts.
Use Optional. An empty option is a functional way of representing the absence of data.
Throw an exception if a result is guaranteed to be present and there's no way to proceed with the normal execution flow.

In this case, I'll choose Optional.

Refactoring using Fold operation

First, let's discuss how results are combined in the original code. In a nutshell, aggregation logic is similar to how logical AND operator works:

When all results are successful, the aggregated result is success.
When there's at least one error, the aggregated result is error.

Now, to solve this problem in a declarative way, we can make use of the idiomatic functional operation fold left, which is represented by the reduce() method in the Stream API.

public static Optional<CompletionStatus> combineAll(List<CompletionStatus> results) {
 return results.stream()
 .reduce((acc, next) -> next == OK ? OK : ERROR);
}

That's it.

This method clearly communicates its intention. It's succinct and fairly easy to comprehend.

If we assume that there's only one meaningfull way to aggregate the results (if it's not the case then use the solution shown above, as I said there should be more context regarding the purpose of these OK and ERROR), then it deserves to be described as behavior specific to CompletionStatus type. And we can go one step further and take advantage of the fact that Java enum is a specialized type of class, and thus can have behavior.

public static Optional<CompletionStatus> combineAll(Collection<CompletionStatus> results) {
 return results.stream()
 .reduce(CompletionStatus::combine);
}

^{It's also a good practice to relax the parameter types. In this case, since we're only traversing results using a stream, it doesn't need to be a List, this code will work with any Collection}

Enum:

public enum CompletionStatus {
 ERROR, OK;
 
 public CompletionStatus combine(CompletionStatus other) {
 return switch (other) {
 case ERROR -> other;
 case OK -> this;
 };
 }
}

Short-circuiting

It's also possible to add short-circuiting to this implementation, so that the stream will terminate after encountering the first ERROR, instead of processing all elements.

This can be done either by employing third-party tools, or by using the standard Stream API alone. The latter is a bit more complicated with the current LTS version (JDK 21 at the time of writing).

So, I'll show how to achieve this with StreamEx:

return StreamEx.of(results)
 .takeWhileInclusive(s -> s == OK) // take all OK elements until the first ERROR + the ERROR element
 .reduce(CompletionStatus::combine);

^{However, I wouldn't recommend complicating the code with this change unless you're certain it provides some tangible benefit. Read the next section on performance to understand what I mean.}

Performance

A couple of answers were mentioning performance, especially Torben was very vocal saying:

loop will also out-perform any stream based solution. Every dot in a stream is an object creation.

That's a bit overexaggerated, because:

JIT-compiler optimizes streams quite well performing inlining, reordering, redundancy elimination, etc. JIT is a beating heart of the JVM, compiled code might drastically differ from the code produced by the interpreter. For example, after profiling the method with enhanced for-loop from the Torben's answer JIT will throw away an Iterator that it uses under the hood so that it will perform like an indexed-based loop. The point is that we don't need to sacrifice readability implementing manually optimizations which JIT-compiler does.
Stream API was designed as a tool for writing expressive declarative code, boosting performance wasn't the goal of introducing streams into JDK (parallel streams is a niche case, they are not used very often).
In real-life applications, only a tiny fraction of code is performance critical. Maintainability is a way more important concern than performance for the bulk of the code base. We want to make it easier delivering new features, and we want our code to be less prone to bugs. For that reason, clean and simple implementations are more valuable than marginal performance gains.

As the rule of thumb, from two solutions with comparatively similar performance, you should choose the one which is cleaner and simpler.

Here's what Donald Knuth, a legend of Computer Science who studded algorithms and performance all his life, said about premature optimization:

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

In order to determine these critical 3% of the code, we profile applications in order to identify performance bottlenecks. Instead of prematurely optimizing everything.
Today, Java is predominantly used today for writing backend services. Normally, the amount of data associated with each request is limited, and collections are very likely to be small or at least moderate in size. We, usually, avoid fetching humungous collections from the database (that's where filtering, grouping and aggregation of massive amounts of data should happen). A client request is unlikely to bring you an enormous collection either (if you allow your clients sending massive payloads which translate into tons of objects in your service, you should probably learn about Denial-of-service attack).

But, please, don't interpret my words as "use Stream everywhere". It's worth emphasising that an ettempt to replace each and every loop with a stream is not a bright idea. Loops and streams are tools with different semantics. By design you might need to perform mutation of elements or side-effects, which is not the proper settings for using streams.

And from the practical standpoint, if you already have a clean, easy to comprehand implementation, then there's little value in trying to make it perfect. Improving tests and implementing new features is better investment of time.

^{In my opinion Torben's implementation and even OP's version are good to go (if OP will fix the issues with stringly-typing and missleading names)}

Avoid using Raw Types

Never drop generic types for no good reason. This safety net can catch at compile time bugs that otherwise might lead to ClassCastException or misbehavior of the application at runtime.

Your method search() is happy to accept a list of any type because you're using a raw type list as a parameter:

public static String search(List src)

You can pass a list of completely unrelated type, such as a List<BigDecimal>, or a List<Cat>. The compiler will allow these calls and at runtime your implementation will return "NA".

But, normally, instead we prefer the compiler to produce a compilation error when a method is invoked with a data type which is different from the one the method is intended to operate with.

Question 12

An excellent answer! This was the first time I have seen an enum used as a BinaryOperator. The reduce method has a fairly limited use in everyday programming compared to the other stream operations so I'm not surprised nobody (me included) before you thought of it. I should point out that the CompletionStatus::combine is in contradiction to the Fowler/Beck quotes. The name describes a generic combining algorithm and the reader has to look at it's implementation to figure out what it does.

Question 13

@TorbenPutkonen Thanks, Torben. In my opinion, the way the question title is framed steers the reader towards thinking in terms of min and max. Regarding CompletionStatus::combine, the method name combine() communicates the idea of producing a single result by combining two statuses, but sure it doesn't tell how exactly it's done. I made an assumption that there's only one meaningful way of aggregating results, hence should be described as behavior specific to CompletionStatus.

Question 14

@TorbenPutkonen It's like Either.merge() in Scala, without knowing the semantics of the type, method name is not completely clear. If this assumption is incorrect, the method has no sense, I made an edit to clarify that (note that OP left us in the dark by giving no context regarding these strings, and CompletionStatus is a hypothetical beast invented). And I expect that the code reader should have an understanding of what CompletionStatus is, when examining a method returning this type.

TorbenPutkonen TorbenPutkonen 9,23921 silver badges29 bronze badges · Answer 1 · 2024-08-13 05:58:04Z