2
\$\begingroup\$

System/Language: I'm using plain Javascript to write this algorithm. Environment: Node 17.4.0 with a system that has ~7000 MiB of Memory and ~400GB of Storage.

Input Data

  • An (AoP) Array of Pairs (a Pair is an array of two elements) where the first element of each Pair is a String and the second element is a Number.
  • The program can ASSUME that the input AoP is sorted in ascending order by the first element of each Pair - the String.

Example of the Input Data

[ ["A", 3], ["A", 2], ["B", 9], ["C", 1], ["C", 2], ["C", 3], ["D", 2], ["E", 0], ["E", 1] ] 

Algorithm

Reduce the input AOP such that all Pairs that have the same String element are merged into one pair with the same string. The Number element should be the sum of all Number elements of the Pairs merged.

Example of the Output Data

[ ["A", 5], ["B", 9], ["C", 6], ["D", 2], ["E", 1] ]

Initial Algorithm

I'm more comfortable writing functions in recursive form (coming from a Scheme background) so I quickly came up with the following:

function sumSameKeysSortedRecursive(arr) {
 if (arr.length === 0) {
 return [];
 } else {
 let [fst, ...rst] = arr;
 let recurAns = sumSameKeysSortedRecursive(rst);
 if (recurAns.length === 0) {
 return [fst];
 } else {
 let [f, ...r] = recurAns;
 return f[0] === fst[0]
 ? [[f[0], f[1] + fst[1]], ...r]
 : [fst, ...recurAns];
 }
 }
}

However, my input data has 11 million lines and won't scale with the recursive version (even a tail-recursive version won't work on Node). The second version uses loops:

function sumSameKeysSortedLoop(arr) {
 let res = [];
 let i = 0; // tracks first occurrence a pair with a unique String
 let j = 0; // tracks all subsequent occurrences of Pairs with the same String
 let len = arr.length;
 while (i < len) {
 let sum = 0;
 let iname = arr[i][0];
 let jname = iname;
 while (jname === iname) {
 sum = sum + arr[j][1];
 j++;
 if (j === len) { // have reached beyond the array bound
 break;
 } else {
 jname = arr[j][0];
 }
 }
 res.push([iname, sum]);
 i = j; // jump over to the new unique element
 if (i === len) {
 break;
 } else {
 iname = arr[i][0];
 }
 }
 return res;
}

Possible Improvements?

I think v2 has space for improvement in at least 2 areas:

  • Readability: In v1, I can read and understand exactly what is happening (can improve it further by adding helper functions), while in v2 a future reader of the code might take more time understanding it. (Could be subjective)
  • while + break;: Is there an easy way to translate the program such that its functionality stays the same but uses a for-loop instead? Or where it does not use break; (a more general version)? Or where it does not have to keep track of so many variables (i, j, sum, iname, jname)?
asked Feb 9, 2022 at 15:53
\$\endgroup\$
2
  • \$\begingroup\$ Of the 11m lines, do you have a sense for how often the string element repeats? Is your output going to be closer to 10m or to 100? \$\endgroup\$ Commented Feb 9, 2022 at 21:27
  • \$\begingroup\$ Expecting my output to be closer to 550k. I'm processing certain kind of transaction on a blockchain history and am assuming 20 such transaction per wallet. \$\endgroup\$ Commented Feb 9, 2022 at 21:40

2 Answers 2

2
\$\begingroup\$

I read your iterative (loop) solution. It's complicated; it's hard to read.


Here's a simplified version of your code.

function merge(aop) {
 const sums = [];
 if (aop.length === 0) {
 return sums;
 }
 const str = aop[0][0];
 let end = sums.push([str, 0]) - 1;
 for (const [str, num] of aop) {
 if (str === sums[end][0]) {
 sums[end][1] += num;
 } else {
 end = sums.push([str, num]) - 1;
 }
 }
 return sums;
}

const example = [ ["A", 3], ["A", 2], ["B", 9], ["C", 1], ["C", 2], ["C", 3], ["D", 2], ["E", 0], ["E", 1] ];
console.log(merge(example));
[ [ 'A', 8 ], [ 'B', 9 ], [ 'C', 6 ], [ 'D', 2 ], [ 'E', 1 ] ]
answered Feb 9, 2022 at 19:48
\$\endgroup\$
2
  • \$\begingroup\$ Thank you! This is way cleaner \$\endgroup\$ Commented Feb 9, 2022 at 21:26
  • 1
    \$\begingroup\$ The arr.push(x) -1 part was a little confusing but once I understood that it represents the index of the last element of the (updated) arr, it was easy to understand. \$\endgroup\$ Commented Feb 9, 2022 at 21:42
1
\$\begingroup\$

Why is it slow

Your first code tries to make a slice on the input array every iteration. It cost too much time to duplicate the list. And it runs in \$O(n^2)\$ complexity.

Suggested rewriting

Your second approach should work as expected. Although it is a bit complex since you need a nested loop and modify the loop variable in the loop with special logic. I would suggest avoiding modifying the loop variable so it becomes easy to understand. You just need to add the previous pair if the current one is equal to the previous one.

function sumSameKeysSortedRecursive(arr) {
 if (!arr.length) {
 return [];
 }
 let prev = [arr[0][0], 0];
 const output = [prev];
 for (let i = 0, l = arr.length; i < l; i++) {
 if (arr[i][0] === prev[0]) {
 prev[1] += arr[i][1];
 } else {
 prev = [...arr[i]];
 output.push(prev);
 }
 }
 return output;
}

Or using flatMap if you prefer:

function sumSameKeysSortedRecursive(arr) {
 let prev = null;
 return arr.flatMap(([code, value]) => {
 if (prev && prev[0] === code) {
 prev[1] += value;
 return [];
 } else {
 prev = [code, value];
 return [prev];
 }
 });
}
konijn
34.2k5 gold badges70 silver badges267 bronze badges
answered Feb 11, 2022 at 10:15
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.