Merge list of maps using Java 8 Stream API

Question 1

I'm trying to learn Java 8 Stream and when I try to convert some function to java8 to practice. I meet a problem.

I'm curious that how can I convert follow code to java stream format.

/*
 * input example:
 * [
 {
 "k1": { "kk1": 1, "kk2": 2},
 "k2": {"kk1": 3, "kk2": 4}
 }
 {
 "k1": { "kk1": 10, "kk2": 20},
 "k2": {"kk1": 30, "kk2": 40}
 }
 ]
 * output:
 * {
 "k1": { "kk1": 11, "kk2": 22},
 "k2": {"kk1": 33, "kk2": 44}
 }
 *
 *
 */
private static Map<String, Map<String, Long>> mergeMapsValue(List<Map<String, Map<String, Long>>> valueList) {
 Set<String> keys_1 = valueList.get(0).keySet();
 Set<String> keys_2 = valueList.get(0).entrySet().iterator().next().getValue().keySet();
 Map<String, Map<String, Long>> result = new HashMap<>();
 for (String k1: keys_1) {
 result.put(k1, new HashMap<>());
 for (String k2: keys_2) {
 long total = 0;
 for (Map<String, Map<String, Long>> mmap: valueList) {
 Map<String, Long> m = mmap.get(k1);
 if (m != null && m.get(k2) != null) {
 total += m.get(k2);
 }
 }
 result.get(k1).put(k2, total);
 }
 }
 return result;
}

Question 2

So all maps are the same - i.e. have the same keys at both levels?

Question 3

This is not a code translation service. You need to show us what you have already tried so we can tell you what you are doing wrong.

Question 4

You should rethink your original approach first, i.e. what to iterate over in the outer loop and what in the inner loop.

Question 5

@Holger I know I should think first, but I totally can't get a way to finish it.

Question 6

The trick here is to collect correctly the inner maps. The workflow would be:

Flat map the list of map List<Map<String, Map<String, Long>>> into a stream of map entries Stream<Map.Entry<String, Map<String, Long>>>.
Group by the key of each of those entry, and for the values mapped to same key, merge the two maps together.

Collecting maps by merging them would ideally warrant a flatMapping collector, which unfortunately doesn't exist in Java 8, although it will exist in Java 9 (see JDK-8071600). For Java 8, it is possible to use the one provided by the StreamEx library (and use MoreCollectors.flatMapping in the following code).

private static Map<String, Map<String, Long>> mergeMapsValue(List<Map<String, Map<String, Long>>> valueList) {
 return valueList.stream()
 .flatMap(e -> e.entrySet().stream())
 .collect(Collectors.groupingBy(
 Map.Entry::getKey,
 Collectors.flatMapping(
 e -> e.getValue().entrySet().stream(),
 Collectors.<Map.Entry<String,Long>,String,Long>toMap(Map.Entry::getKey, Map.Entry::getValue, Long::sum)
 )
 ));
}

Without using this convenient collector, we can still build our own with equivalent semantics:

private static Map<String, Map<String, Long>> mergeMapsValue2(List<Map<String, Map<String, Long>>> valueList) {
 return valueList.stream()
 .flatMap(e -> e.entrySet().stream())
 .collect(Collectors.groupingBy(
 Map.Entry::getKey,
 Collector.of(
 HashMap::new,
 (r, t) -> t.getValue().forEach((k, v) -> r.merge(k, v, Long::sum)),
 (r1, r2) -> { r2.forEach((k, v) -> r1.merge(k, v, Long::sum)); return r1; }
 )
 ));
}

Question 7

I've stolen your Long::sum - I keep forgetting that exists. I think your Stream approach is better overall as it doesn't produce intermediate List results - although both yours and mine are completely illegible. I would advocate for the foreach with Java 8 Map methods approach...

Question 8

@BoristheSpider Yes, using good old for loops would perhaps be generally better here.

Question 9

I'm sorry I don't understand your method, So I add printf in the function. It seems the lambda (r1, r2) -> { r2.forEach((k, v) -> r1.merge(k, v, Long::sum)); return r1; } don't be execute at all. Can you explain how the Collector.of works? I can't get much information about it by searching.

Question 10

@yunfan the final argument to the Collector is the "combiner" - this will only be used if the stream is run in parallel for a sufficiently large datasets and will combine the results from different threads.

Question 11

As a starting point, converting to use computeIfAbsent and merge gives us the following:

private static <K1, K2> Map<K1, Map<K2, Long>> mergeMapsValue(List<Map<K1, Map<K2, Long>>> valueList) {
 final Map<K1, Map<K2, Long>> result = new HashMap<>();
 for (final Map<K1, Map<K2, Long>> map : valueList) {
 for (final Map.Entry<K1, Map<K2, Long>> sub : map.entrySet()) {
 for (final Map.Entry<K2, Long> subsub : sub.getValue().entrySet()) {
 result.computeIfAbsent(sub.getKey(), k1 -> new HashMap<>())
 .merge(subsub.getKey(), subsub.getValue(), Long::sum);
 }
 }
 }
 return result;
}

This removes much of the logic from your inner loop.

This code below is wrong, I leave it here for reference.

Converting to the Stream API is not going to make it neater, but lets give it a go.

import static java.util.stream.Collectors.collectingAndThen;
import static java.util.stream.Collectors.groupingBy;
import static java.util.stream.Collectors.mapping;
import static java.util.stream.Collectors.toList;
private static <K1, K2> Map<K1, Map<K2, Long>> mergeMapsValue(List<Map<K1, Map<K2, Long>>> valueList) {
 return valueList.stream()
 .flatMap(v -> v.entrySet().stream())
 .collect(groupingBy(Entry::getKey, collectingAndThen(mapping(Entry::getValue, toList()), l -> l.stream()
 .reduce(new HashMap<>(), (l2, r2) -> {
 r2.forEach((k, v) -> l2.merge(k, v, Long::sum);
 return l2;
 }))));
}

This is what I've managed to come up with - it's horrible. The problem is that with the foreach approach, you have a reference to each level of the iteration - this makes the logic simple. With the functional approach, you need to consider each folding operation separately.

How does it work?

We first stream() our List<Map<K1, Map<K2, Long>>>, giving a Stream<Map<K1, Map<K2, Long>>>. Next we flatMap each element, giving a Stream<Entry<K1, Map<K2, Long>>> - so we flatten the first dimension. But we cannot flatten further as we need to K1 value.

So we then use collect(groupingBy) on the K1 value giving us a Map<K1, SOMETHING> - what is something?

Well, first we use a mapping(Entry::getValue, toList()) to give us a Map<K1, List<Map<K2, Long>>>. We then use collectingAndThen to take that List<Map<K2, Long>> and reduce it. Note that this means we produce an intermediate List, which is wasteful - you could get around this by using a custom Collector.

For this we use List.stream().reduce(a, b) where a is the initial value and b is the "fold" operation. a is set to new HashMap<>() and b takes two values: either the initial value or the result of the previous application of the function and the current item in the List. So we, for each item in the List use Map.merge to combine the values.

I would say that this approach is more or less illegible - you won't be able to decipher it in a few hours time, let alone a few days.

Question 12

DON’T use reduce when you are modifying the arguments of the function.

Question 13

@Holger . I don't get your mean. It seems his code don't modifying the arguments of the function.

Question 14

@yunfan: the function passed to reduce modifies the the map l2 by invoking merge on it for each mapping of r2.

Question 15

@yunfan, Holger is correct - reduce is meant for immutable items and returning a result combining the two inputs. I mutate the the LHS Map given as an input to the combiner - this breaks the contract.

Question 16

@BoristheSpider Thank your so much. So the best implement of your method is create new hashmap merge both l2 and r2? I'm also wonder how can I get the count of groupingby results to calculate the average results?

Question 17

I took the flatMap(e -> e.entrySet().stream()) part from Tunaki, but used a shorter variant for the collector:

Map<String, Integer> merged = maps.stream()
 .flatMap(map -> map.entrySet().stream())
 .collect(Collectors.toMap(
 Map.Entry::getKey, Map.Entry::getValue, Integer::sum));

More elaborate example:

Map<String, Integer> a = new HashMap<String, Integer>() {{
 put("a", 2);
 put("b", 5);
}};
Map<String, Integer> b = new HashMap<String, Integer>() {{
 put("a", 7);
}};
List<Map<String, Integer>> maps = Arrays.asList(a, b);
Map<String, Integer> merged = maps.stream()
 .flatMap(map -> map.entrySet().stream())
 .collect(Collectors.toMap(
 Map.Entry::getKey, Map.Entry::getValue, Integer::sum));
assert merged.get("a") == 9;
assert merged.get("b") == 5;

Tunaki Tunaki 138k46 gold badges367 silver badges443 bronze badges · Accepted Answer · 2016-06-28 09:55:27Z

The trick here is to collect correctly the inner maps. The workflow would be:

Flat map the list of map List<Map<String, Map<String, Long>>> into a stream of map entries Stream<Map.Entry<String, Map<String, Long>>>.
Group by the key of each of those entry, and for the values mapped to same key, merge the two maps together.

Collecting maps by merging them would ideally warrant a flatMapping collector, which unfortunately doesn't exist in Java 8, although it will exist in Java 9 (see JDK-8071600). For Java 8, it is possible to use the one provided by the StreamEx library (and use MoreCollectors.flatMapping in the following code).

private static Map<String, Map<String, Long>> mergeMapsValue(List<Map<String, Map<String, Long>>> valueList) {
 return valueList.stream()
 .flatMap(e -> e.entrySet().stream())
 .collect(Collectors.groupingBy(
 Map.Entry::getKey,
 Collectors.flatMapping(
 e -> e.getValue().entrySet().stream(),
 Collectors.<Map.Entry<String,Long>,String,Long>toMap(Map.Entry::getKey, Map.Entry::getValue, Long::sum)
 )
 ));
}

Without using this convenient collector, we can still build our own with equivalent semantics:

private static Map<String, Map<String, Long>> mergeMapsValue2(List<Map<String, Map<String, Long>>> valueList) {
 return valueList.stream()
 .flatMap(e -> e.entrySet().stream())
 .collect(Collectors.groupingBy(
 Map.Entry::getKey,
 Collector.of(
 HashMap::new,
 (r, t) -> t.getValue().forEach((k, v) -> r.merge(k, v, Long::sum)),
 (r1, r2) -> { r2.forEach((k, v) -> r1.merge(k, v, Long::sum)); return r1; }
 )
 ));
}

I've stolen your Long::sum - I keep forgetting that exists. I think your Stream approach is better overall as it doesn't produce intermediate List results - although both yours and mine are completely illegible. I would advocate for the foreach with Java 8 Map methods approach...
@BoristheSpider Yes, using good old for loops would perhaps be generally better here.
I'm sorry I don't understand your method, So I add printf in the function. It seems the lambda (r1, r2) -> { r2.forEach((k, v) -> r1.merge(k, v, Long::sum)); return r1; } don't be execute at all. Can you explain how the Collector.of works? I can't get much information about it by searching.
@yunfan the final argument to the Collector is the "combiner" - this will only be used if the stream is run in parallel for a sufficiently large datasets and will combine the results from different threads.

CollectivesTM on Stack Overflow

Merge list of maps using Java 8 Stream API

3 Answers 3

4 Comments

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

4 Comments

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related