I would like to merge multiple arrays that are already sorted, into one sorted array.
My arrays look like this :
{
"arrays": [
[
{
"type": "tag",
"position": "start",
"text": true,
"offset": 1280,
}
{
"type": "tag",
"position": "end",
"text": true,
"offset": 1350,
}
],
[
{
"type": "tag",
"position": "start",
"text": false,
"offset": 1121,
},
{
"type": "tag",
"position": "end",
"text": true,
"offset": 1300,
}
],
],
...
}
My current implementation looks like this :
function getMinFromArrays(arrays, state) {
let minIndex = -1;
for (let i = 0, l = arrays.length; i < l; i++) {
if (state[i] >= arrays[i].length) {
continue;
}
if (minIndex === -1) {
minIndex = i;
}
if (arrays[i][state[i]].offset < arrays[minIndex][state[minIndex]].offset) {
minIndex = i;
}
}
if (minIndex === -1) {
throw new Error("minIndex negative");
}
return minIndex;
}
module.exports = function (arrays) {
const totalLength = arrays.reduce(function (sum, array) {
return sum + array.length;
}, 0);
const resultArray = new Array(totalLength);
const state = arrays.map(function () { return 0; });
let i = 0;
while(i <= totalLength - 1) {
const arrayIndex = getMinFromArrays(arrays, state);
resultArray[i] = arrays[arrayIndex][state[arrayIndex]];
state[arrayIndex]++;
i++;
}
return resultArray;
};
I think It might be possible to optimize this code (sometimes, it seems to be quite slow (70ms for 10 arrays of size 5000 each).
1 Answer 1
The built-in facilities should take care of this for you.
You can just merge them with concat and then use Array's sort method, which takes an optional custom comparator (in your case, "compare by offset"). This should run in O(n log(n)):
function merge(arrays) {
return arrays.reduce((m,x) => m.concat(x), [])
.sort((a,b) => a.offset - b.offset);
}
NOTE: This doesn't take full advantage of the fact that the individual arrays are already sorted (which will give an O(n) solution), but it's so much simpler it's worth checking if it meets your needs before optimizing further.
UPDATE
I went ahead and did a rewrite with some guesses at micro-optimizations that might help (or might not). Try it on your sample data and see what happens.
The code is a bit more compact than your original, but still not what I consider highly readable. I had to sacrifice some readability to write this in an optimized, procedural way:
function merge(arrays) {
var indexes = Array(arrays.length).fill(0);
var result = [];
var minIndex, minValue, minObject; // for minByOffset method
const totalLength = arrays.reduce((sum, arr) => sum + arr.length, 0);
const infiniteOffset = {offset: Infinity};
const nextUnusedElm = (arr,i) => arr[indexes[i]] || infiniteOffset;
while (result.length < totalLength) {
var candidates = arrays.map(nextUnusedElm);
minByOffset(candidates);
indexes[minIndex]++;
result.push(minObject);
}
return result;
function minByOffset(arr) {
minValue = Infinity;
arr.forEach((x,i) => {
if (x.offset >= minValue) return;
minIndex = i;
minObject = x;
minValue = x.offset;
});
}
}
A final optimization you could make (again, it may or may not have much of an effect) is to remove an array once all of its elements have been used up. That is, restructure the code to avoid some unnecessary comparisons. In your code, it would avoid:
if (state[i] >= arrays[i].length) {
In my version, it would avoid:
arr[indexes[i]] || infiniteOffset;
That is, an extra array lookup and a comparison inside the loop. These are fast operations, so the optimization may not matter...
-
\$\begingroup\$ I ended up implementing it this way since the default sort was much slower (taking multiple seconds). Do you think my solution is O(n) ? \$\endgroup\$edi9999– edi99992016年06月17日 13:46:57 +00:00Commented Jun 17, 2016 at 13:46
-
\$\begingroup\$ yeah it appears to be.... the code can be cleaned up and you might squeeze out a bit more speed, but i think it will minor \$\endgroup\$Jonah– Jonah2016年06月17日 14:14:29 +00:00Commented Jun 17, 2016 at 14:14
-
\$\begingroup\$ @edi9999 Merging two sorted arrays is O(n) where n is the length of the longest array. Your solution would technically be O(nm) where m is the number of arrays to merge since the number of arrays you merge is variable. \$\endgroup\$Joshua Dawson– Joshua Dawson2016年06月17日 22:00:39 +00:00Commented Jun 17, 2016 at 22:00
-
\$\begingroup\$ True but for a fixed input that's the same as order N \$\endgroup\$Jonah– Jonah2016年06月17日 22:12:28 +00:00Commented Jun 17, 2016 at 22:12
offset
is the only property on which one would meaningfully sort. \$\endgroup\$