Suppose I have a list of hashmaps (<String, Object>
). Values can be either scalar types or a nested HashMap with the same type.
I need to merge a list of this maps into a final HashMap, each map in the list has higher precedence than the one before. Also the merge has to be deep: i.e.:
map1 = {root: {a: a, b: {c: c, d: d}}}
map2 = {root: {a: a, b: {c: x}}
result = {root: {a: a, b: {c: x, d: d}}}
In the previous example, the sub tree b
in map2
does not override the whole subtree in map1
. It just overrides the leaves that are duplicated.
The code I have right now looks like this and works ok:
public static Map<String, Object> merge(List<Map<String, Object>> configs) {
return configs.stream()
.filter(Objects::nonNull)
.reduce(new LinkedHashMap<>(), ConfigMerger::mergeMaps);
}
public static Map<String, Object> mergeMaps(Map<String, Object> map, Map<String, Object> other) {
Map<String, Object> result = new LinkedHashMap<>(map.size() + other.size());
result.putAll(map);
for (Map.Entry<String, Object> entry : other.entrySet()) {
String key = entry.getKey();
Object val2 = entry.getValue();
Object val1 = result.get(key);
if (val1 instanceof Map && val2 instanceof Map) {
Map<String, Object> nestedResult = mergeMaps((Map<String, Object>) val1, (Map<String, Object>) val2);
result.put(key, nestedResult);
} else {
result.put(key, val2);
}
}
return result;
}
The only issue is that result.putAll(map);
is a bit wasteful. I do that for every nested map in each one of the original maps.
I tried a slightly more complicated approach, which is around 20% faster, and yet the most expensive method is the HashSet.addAll
:
// (assumes the list in reversed order)
public static Map<String, Object> merge2(List<Map<String, Object>> configs) {
Map<String, Object> result = new HashMap<>();
Set<String> allKeys = new HashSet<>();
for (Map<String, Object> config : configs) {
allKeys.addAll(config.keySet());
}
for (String key : allKeys) {
if (configs.get(0).get(key) instanceof Map<?, ?>) {
List<Map<String, Object>> nestedConfigs = new ArrayList<>();
for (Map<String, Object> config : configs) {
if (config.get(key) instanceof Map<?, ?>) {
nestedConfigs.add((Map<String, Object>) config.get(key));
}
}
result.put(key, merge2(nestedConfigs));
} else {
for (Map<String, Object> config : configs) {
if (config.containsKey(key)) {
result.put(key, config.get(key));
break;
}
}
}
}
return result;
}
Is there any other approach I could try that could help me squeeze a bit more of performance out of this?
For context, the maps are usually huge (hundreds of subtrees) and the nesting level up to fifty sometimes.
-
\$\begingroup\$ On SO: How to efficiently merge a list of deeply nested HashMaps? \$\endgroup\$greybeard– greybeard2023年06月08日 20:27:06 +00:00Commented Jun 8, 2023 at 20:27
-
\$\begingroup\$ Please do not edit the question, especially the code, after an answer has been posted. Changing the question may cause answer invalidation. Everyone needs to be able to see what the reviewer was referring to. What to do after the question has been answered. \$\endgroup\$pacmaninbw– pacmaninbw ♦2023年06月09日 00:04:41 +00:00Commented Jun 9, 2023 at 0:04
2 Answers 2
any other approach ... that could help me squeeze a bit more performance ?
Choose a nice output format for your maps,
and serialize each one to a text file (or to an RDBMS).
For example the original result
might come out as:
a: a
b.c: x
b.d: d
Choose a stable sort, and perform an
external
merge sort of all files.
For example, /usr/bin/sort -s
springs to mind.
Uniquify duplicate keys. Deserialize the result into a java Map.
Certainly there are ways to accomplish this in RAM, without ever leaving the JVM.
-
\$\begingroup\$ Yeah, ideally I would want to do something where I do not leave the JVM. For context, we run these config merges in the order of around 5 thousand per second. \$\endgroup\$Cristian– Cristian2023年06月08日 06:10:25 +00:00Commented Jun 8, 2023 at 6:10
-
\$\begingroup\$ You explain that you wish to process maps that are "huge" in < 200 μsec. That suggests an incremental approach. Yet you're not offering a Δ map of changed keys. Seems like a disconnect. We desire O(Δ) effort yet the input requires that we scan O(N) entries. Oh, well. \$\endgroup\$J_H– J_H2023年06月08日 06:46:04 +00:00Commented Jun 8, 2023 at 6:46
You present uncommented code.
While I might succeed figuring out whatmergeMaps()
achieves, it may be intended to do something slightly different:- Document you code. In the code.
Use "Javadoc style" comments to document what everything intended for external use is there for.
- Document you code. In the code.
I don't think
merge2()
even works as intended: withmap1 = {root: {a: a, b: {c: c, d: d}}}, map2 = {root: {b: {c: x}}, map3 = {root: {a: a, d: e}
, wouldn't
b
go un-merged due to not appearing inmap3
(reversed orderconfigs.get(0)
)?if
val2
isn't aMap
, you never useval1
:Object val1 = null; if (val2 instanceof Map && (val1 = result.get(key)) instanceof Map) val2 = mergeMaps((Map<String, Object>) val1, (Map<String, Object>) val2); result.put(key, val2);
In the end, every result value that is a Map
will "need to have been merged", so where does mergeMaps()
invest futile effort?
I think it is in merging Map
s that don't end up in the result as well as in
instantiating a new Map
+copy for every Map
merged in.
Trying to avoid both using Nat's class Reversed
:
// When merging maps in a list with later maps taking precedence
// and a non-map value replacing any resulting map so far,
// there is one catch to processing in reverse:
// one needs to tell *should be merged* from isCompleted
static class CompletableMap<K, V> extends LinkedHashMap<K, V> {
private boolean completed = false;
void complete() { completed = true; }
boolean isCompleted() { return completed; }
CompletableMap() { super(); }
CompletableMap(int initialCapacity) { super(initialCapacity); }
CompletableMap(Map<? extends K, ? extends V> m) { super(m); }
}
/** merges <code>configurations</code>, later entries taking precedence. */
public static Map<String, Object>
merge(List<Map<String, Object>> configurations) {
Map<String, Object> configuration = new LinkedHashMap<>();
if (null != configurations) {
// for (ListIterator<Object> configs = configurations.listIterator(configurations.size()) ; configs.hasPrevious() ; )
// Object previous = configs.previous();
for (Object previous : Reversed.reversed(configurations))
mergeMaps(configuration, previous);
}
return configuration;
}
/** merges <code>accumulating</code>
* and <code>previousConfiguration</code>,
* entries in <code>accumulating</code> taking precedence.
*///public
static void mergeMaps(Map<String, Object> accumulating,
Map<String, Object> previousConfiguration) {
for (Map.Entry<String, Object>
previous : previousConfiguration.entrySet()) {
String key = previous.getKey();
Object
accumulated = accumulating.get(key),
previousValue = previous.getValue();
if (previousValue instanceof Map) {
Map<String, Object>
previousMap = (Map<String, Object>) previousValue;
if (null == accumulated)
accumulating.put(key,
new CompletableMap<String, Object>(previousMap));
else if (accumulated instanceof Map
&& !((CompletableMap<String, Object>)accumulated)
.isCompleted())
mergeMaps((Map<String, Object>) accumulated, previousMap);
} else if (null == accumulated)
accumulating.put(key, previousValue);
else if (accumulated instanceof CompletableMap)
((CompletableMap<String, Object>)accumulated).complete();
}
}
public static void main(String []args) {
List<Map<String, Object>> c = List.of(
Map.of("a", "1"),
Map.of("b", Map.of("c", "2", "d", "3")),
Map.of("b", "4"),
Map.of("b", Map.of("c", "5", "e", "6")),
Map.of("b", Map.of("c", "7")),
Map.of("b", Map.of("c", "8")));
System.out.println(merge(c));
}
Caveat: Untested, if tried out.
Caught unawares: top level entries are in reverse order.
-
\$\begingroup\$ From Java 19,
CompletableMap(int initialCapacity)
should probably callLinkedHashMap.newLinkedHashMap(initialCapacity)
. \$\endgroup\$greybeard– greybeard2023年06月08日 10:09:57 +00:00Commented Jun 8, 2023 at 10:09 -
\$\begingroup\$ First of all, thanks for taking the time to write a respond. Let me answer some of your small questions here, but expand on the original post after I edit it: 1. The code I pasted does not have comments because I removed them, as they are not that helpful. I will not only add them in the edit, but include the original code (which is Kotlin and not Java) 2. I know merge2 does not work as-is. That's why I added a comment explaining that a reversal of the list must be in place. 3. About the reversed thing... yeah, I get it and I tried something similar but the performance gains are minimal. \$\endgroup\$Cristian– Cristian2023年06月08日 20:12:49 +00:00Commented Jun 8, 2023 at 20:12
-
\$\begingroup\$
expand on the original post after I edit it
(Heeding What should I not do when someone answers my question?, hopefully.) \$\endgroup\$greybeard– greybeard2023年06月08日 20:16:21 +00:00Commented Jun 8, 2023 at 20:16 -
\$\begingroup\$ Fair. In my defense, the code I pasted, even though it's "better"... it still does not satisfy the original premise: how to efficiently merge a list of deeply nested maps. \$\endgroup\$Cristian– Cristian2023年06月08日 20:21:04 +00:00Commented Jun 8, 2023 at 20:21
-
\$\begingroup\$ When asking a question about a Kotlin implementation of a solution to the same problem, provide more context. What are those 5000 configurations per second used for? There may be an approach less resource hungry than creating a merged Map. Are you aware of the concept of a Map with backup-Map delegating lookups to another Map for unmapped keys, possibly caching results? \$\endgroup\$greybeard– greybeard2023年06月09日 07:50:22 +00:00Commented Jun 9, 2023 at 7:50