4
\$\begingroup\$

Suppose I have a list of hashmaps (<String, Object>). Values can be either scalar types or a nested HashMap with the same type.

I need to merge a list of this maps into a final HashMap, each map in the list has higher precedence than the one before. Also the merge has to be deep: i.e.:

map1 = {root: {a: a, b: {c: c, d: d}}}
map2 = {root: {a: a, b: {c: x}}
result = {root: {a: a, b: {c: x, d: d}}}

In the previous example, the sub tree b in map2 does not override the whole subtree in map1. It just overrides the leaves that are duplicated.

The code I have right now looks like this and works ok:

public static Map<String, Object> merge(List<Map<String, Object>> configs) {
 return configs.stream()
 .filter(Objects::nonNull)
 .reduce(new LinkedHashMap<>(), ConfigMerger::mergeMaps);
}
public static Map<String, Object> mergeMaps(Map<String, Object> map, Map<String, Object> other) {
 Map<String, Object> result = new LinkedHashMap<>(map.size() + other.size());
 result.putAll(map);
 for (Map.Entry<String, Object> entry : other.entrySet()) {
 String key = entry.getKey();
 Object val2 = entry.getValue();
 Object val1 = result.get(key);
 if (val1 instanceof Map && val2 instanceof Map) {
 Map<String, Object> nestedResult = mergeMaps((Map<String, Object>) val1, (Map<String, Object>) val2);
 result.put(key, nestedResult);
 } else {
 result.put(key, val2);
 }
 }
 return result;
}

The only issue is that result.putAll(map); is a bit wasteful. I do that for every nested map in each one of the original maps.

I tried a slightly more complicated approach, which is around 20% faster, and yet the most expensive method is the HashSet.addAll:

// (assumes the list in reversed order)
public static Map<String, Object> merge2(List<Map<String, Object>> configs) {
 Map<String, Object> result = new HashMap<>();
 Set<String> allKeys = new HashSet<>();
 for (Map<String, Object> config : configs) {
 allKeys.addAll(config.keySet());
 }
 for (String key : allKeys) {
 if (configs.get(0).get(key) instanceof Map<?, ?>) {
 List<Map<String, Object>> nestedConfigs = new ArrayList<>();
 for (Map<String, Object> config : configs) {
 if (config.get(key) instanceof Map<?, ?>) {
 nestedConfigs.add((Map<String, Object>) config.get(key));
 }
 }
 result.put(key, merge2(nestedConfigs));
 } else {
 for (Map<String, Object> config : configs) {
 if (config.containsKey(key)) {
 result.put(key, config.get(key));
 break;
 }
 }
 }
 }
 return result;
}

Is there any other approach I could try that could help me squeeze a bit more of performance out of this?

For context, the maps are usually huge (hundreds of subtrees) and the nesting level up to fifty sometimes.

pacmaninbw
26.2k13 gold badges47 silver badges113 bronze badges
asked Jun 7, 2023 at 23:57
\$\endgroup\$
2

2 Answers 2

2
\$\begingroup\$

any other approach ... that could help me squeeze a bit more performance ?

Choose a nice output format for your maps, and serialize each one to a text file (or to an RDBMS). For example the original result might come out as:

a: a
b.c: x
b.d: d

Choose a stable sort, and perform an external merge sort of all files. For example, /usr/bin/sort -s springs to mind.

Uniquify duplicate keys. Deserialize the result into a java Map.


Certainly there are ways to accomplish this in RAM, without ever leaving the JVM.

answered Jun 8, 2023 at 0:56
\$\endgroup\$
2
  • \$\begingroup\$ Yeah, ideally I would want to do something where I do not leave the JVM. For context, we run these config merges in the order of around 5 thousand per second. \$\endgroup\$ Commented Jun 8, 2023 at 6:10
  • \$\begingroup\$ You explain that you wish to process maps that are "huge" in < 200 μsec. That suggests an incremental approach. Yet you're not offering a Δ map of changed keys. Seems like a disconnect. We desire O(Δ) effort yet the input requires that we scan O(N) entries. Oh, well. \$\endgroup\$ Commented Jun 8, 2023 at 6:46
1
\$\begingroup\$
  • You present uncommented code.
    While I might succeed figuring out what mergeMaps() achieves, it may be intended to do something slightly different:

    • Document you code. In the code.
      Use "Javadoc style" comments to document what everything intended for external use is there for.
  • I don't think merge2() even works as intended: with

    map1 = {root: {a: a, b: {c: c, d: d}}},
    map2 = {root: {b: {c: x}},
    map3 = {root: {a: a, d: e}
    

    , wouldn't b go un-merged due to not appearing in map3 (reversed order configs.get(0))?

  • if val2 isn't a Map, you never use val1:

     Object val1 = null;
     if (val2 instanceof Map && (val1 = result.get(key)) instanceof Map)
     val2 = mergeMaps((Map<String, Object>) val1, (Map<String, Object>) val2);
     result.put(key, val2);
    

In the end, every result value that is a Map will "need to have been merged", so where does mergeMaps() invest futile effort?
I think it is in merging Maps that don't end up in the result as well as in
instantiating a new Map+copy for every Map merged in.
Trying to avoid both using Nat's class Reversed:

// When merging maps in a list with later maps taking precedence
// and a non-map value replacing any resulting map so far,
// there is one catch to processing in reverse:
// one needs to tell *should be merged* from isCompleted
static class CompletableMap<K, V> extends LinkedHashMap<K, V> {
 private boolean completed = false;
 void complete() { completed = true; }
 boolean isCompleted() { return completed; }
 CompletableMap() { super(); }
 CompletableMap(int initialCapacity) { super(initialCapacity); }
 CompletableMap(Map<? extends K, ? extends V> m) { super(m); }
}
/** merges <code>configurations</code>, later entries taking precedence. */
public static Map<String, Object> 
merge(List<Map<String, Object>> configurations) {
 Map<String, Object> configuration = new LinkedHashMap<>();
 if (null != configurations) {
 // for (ListIterator<Object> configs = configurations.listIterator(configurations.size()) ; configs.hasPrevious() ; )
 // Object previous = configs.previous();
 for (Object previous : Reversed.reversed(configurations))
 mergeMaps(configuration, previous);
 }
 return configuration;
}
/** merges <code>accumulating</code>
 * and <code>previousConfiguration</code>,
 * entries in <code>accumulating</code> taking precedence.
 *///public 
static void mergeMaps(Map<String, Object> accumulating, 
 Map<String, Object> previousConfiguration) {
 for (Map.Entry<String, Object> 
 previous : previousConfiguration.entrySet()) {
 String key = previous.getKey();
 Object
 accumulated = accumulating.get(key),
 previousValue = previous.getValue();
 if (previousValue instanceof Map) {
 Map<String, Object> 
 previousMap = (Map<String, Object>) previousValue;
 if (null == accumulated)
 accumulating.put(key,
 new CompletableMap<String, Object>(previousMap));
 else if (accumulated instanceof Map
 && !((CompletableMap<String, Object>)accumulated)
 .isCompleted())
 mergeMaps((Map<String, Object>) accumulated, previousMap);
 } else if (null == accumulated)
 accumulating.put(key, previousValue);
 else if (accumulated instanceof CompletableMap)
 ((CompletableMap<String, Object>)accumulated).complete();
 }
}
public static void main(String []args) {
 List<Map<String, Object>> c = List.of(
 Map.of("a", "1"),
 Map.of("b", Map.of("c", "2", "d", "3")),
 Map.of("b", "4"),
 Map.of("b", Map.of("c", "5", "e", "6")),
 Map.of("b", Map.of("c", "7")),
 Map.of("b", Map.of("c", "8")));
 System.out.println(merge(c));
}

Caveat: Untested, if tried out.
Caught unawares: top level entries are in reverse order.

answered Jun 8, 2023 at 9:54
\$\endgroup\$
6
  • \$\begingroup\$ From Java 19, CompletableMap(int initialCapacity) should probably call LinkedHashMap.newLinkedHashMap(initialCapacity). \$\endgroup\$ Commented Jun 8, 2023 at 10:09
  • \$\begingroup\$ First of all, thanks for taking the time to write a respond. Let me answer some of your small questions here, but expand on the original post after I edit it: 1. The code I pasted does not have comments because I removed them, as they are not that helpful. I will not only add them in the edit, but include the original code (which is Kotlin and not Java) 2. I know merge2 does not work as-is. That's why I added a comment explaining that a reversal of the list must be in place. 3. About the reversed thing... yeah, I get it and I tried something similar but the performance gains are minimal. \$\endgroup\$ Commented Jun 8, 2023 at 20:12
  • \$\begingroup\$ expand on the original post after I edit it (Heeding What should I not do when someone answers my question?, hopefully.) \$\endgroup\$ Commented Jun 8, 2023 at 20:16
  • \$\begingroup\$ Fair. In my defense, the code I pasted, even though it's "better"... it still does not satisfy the original premise: how to efficiently merge a list of deeply nested maps. \$\endgroup\$ Commented Jun 8, 2023 at 20:21
  • \$\begingroup\$ When asking a question about a Kotlin implementation of a solution to the same problem, provide more context. What are those 5000 configurations per second used for? There may be an approach less resource hungry than creating a merged Map. Are you aware of the concept of a Map with backup-Map delegating lookups to another Map for unmapped keys, possibly caching results? \$\endgroup\$ Commented Jun 9, 2023 at 7:50

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.