Optimize/Refactor Javascript Unique Array function

Question 1

This is one idea of implementation:

 Array.prototype.unique = function () {
 unique_array = [];
 for (var i = 0, l = this.length; i < l; i++) {
 if(unique_array.indexOf(this[i]) == -1){
 unique_array.push(this[i]);
 }
 }
 return unique_array;
 }

This version uses Object.keys which is a ECMAScript 5 only feature, as you can see on this website http://kangax.github.com/es5-compat-table/

Array.prototype.unique_e5 = function () {
 unique_object = {};
 for (var i = 0, l = this.length; i < l; i++) {
 unique_object[this[i]] = 1;
 }
 return Object.keys(unique_object);
}

And this is how is done in prototype.js https://github.com/sstephenson/prototype/blob/master/src/prototype/lang/array.js

 /**
 * Array#uniq([sorted = false]) -> Array
 * - sorted (Boolean): Whether the array has already been sorted. If `true`,
 * a less-costly algorithm will be used.
 *
 * Produces a duplicate-free version of an array. If no duplicates are
 * found, the original array is returned.
 *
 * On large arrays when `sorted` is `false`, this method has a potentially
 * large performance cost.
 *
 * ##### Examples
 *
 * [1, 3, 2, 1].uniq();
 * // -> [1, 2, 3]
 *
 * ['A', 'a'].uniq();
 * // -> ['A', 'a'] (because String comparison is case-sensitive)
 **/
 function uniq(sorted) {
 return this.inject([], function(array, value, index) {
 if (0 == index || (sorted ? array.last() != value : !array.include(value)))
 array.push(value);
 return array;
 });
 }

Also not that the prototype version uses the Prototype enumerable method include, which is:

 /**
 * Enumerable#include(object) -> Boolean
 * - object (?): The object to look for.
 *
 * Determines whether a given object is in the enumerable or not,
 * based on the `==` comparison operator (equality with implicit type
 * conversion).
 *
 * ##### Examples
 *
 * $R(1, 15).include(10);
 * // -> true
 *
 * ['hello', 'world'].include('HELLO');
 * // -> false ('hello' != 'HELLO')
 *
 * [1, 2, '3', '4', '5'].include(3);
 * // -> true ('3' == 3)
 **/
 function include(object) {
 if (Object.isFunction(this.indexOf))
 if (this.indexOf(object) != -1) return true;
 var found = false;
 this.each(function(value) {
 if (value == object) {
 found = true;
 throw $break;
 }
 });
 return found;
 }

Is there a better way to do it? faster or "better" cross browser compatible?

Question 2

I'm not entirely sure, but I think you're asking a similar question to mine, on SO. I was trying to get people to close-vote it, but as luck would have it, it hasn't been closed yet. It's not that good of a fit for SO, and I have my answer and a couple of ppl suggested it to be moved to this site, but I haven't gotten round to that either.

Question 3

The issue with using an object is that you won't be able to recognize "proto". (ES6 Map solves this issue.) But it's the best algorithm in term of big-o cost so it could be interesting if you're going to have laaaarge arrays.

Question 4

(I meant __proto__.)

Question 5

For me, I'd always avoid methods that require lots of includes to things get working - and in my mind, the more code used.. the slower things will be (unless some form of caching is used). Which is why I would opt for a simple JavaScript solution. Your idea will work, but I think this one is faster:

Array.prototype.unique = function () {
 var a = this, b = [], c, i = a.length;
 again: while ( i-- ) {
 c = a[i];
 k = i; while( k-- ){ if (a[k] == c){ continue again; } }
 b.unshift( a[i] );
 }
 return b;
}

There are probably other improvements that can be made, for example it might be faster to find a way to use .push() rather than .unshift().

I haven't tested the above excessively, but it seems to work in all my tests so far. The reason why it gets a speed increase is because it reduces the area it is checking each time; it is also using subtle other speed increases like a decrementing while loop (means there are less conditional statements to check on each iteration), and creating shortcut vars that cut down access time.

As proof here is a jsPerf... albeit only tested on my set-up so far ;)

http://jsperf.com/compare-array-unique-versions

side note: -- the downside to my method is that it will only include the last found occurance of a duplicate (not the first as your's will). So if that ordering is important to you, then you'll have to refactor the code.

revision: -- after a few jsPerfs it seems clear that the while(i--) no longer holds a speed difference (at least not for FireFox 16 Mac OSX). Whilst on Chrome Mac OSX i--; seems slower than i++;

http://jsperf.com/compare-a-dec-while-against-a-for-loop

So taking in to account BillyBarry's comments the improved version should be:

Array.prototype.unique8 = function () {
 var a = this, b = [], c, i, l = a.length, j, k = 0;
 again: for ( i = 0; i < l; i++ ) {
 c = a[i];
 for ( j = 0; j < k; j++ ) { if (b[j] === c){ continue again; } }
 b[k++] = c;
 }
 return b;
}

Working from b, rather than a improves things quite a lot. Plus using k++; rather than .length for the internal loop makes quite a bit of difference for FireFox (Mac OSX) but has no affect on Chrome.

Question 6

It would be slightly faster to use b.push() than unshift, but it will be significantly faster to change the inner loop to loop over b instead of a if you know that it has many repetitions. see: jsperf.com/compare-array-unique-versions/2

Question 7

+1 Yep, I was pretty certain .push() would be better - and from looking at your jsPerf I'm surprised the for loop version seems to win out (I guess that while trick is no longer true in recent interpreters - need to refresh my assumptions ;). Good point about interating over b instead - I was trying to cut down on variable usage and length checks but it seems these don't impact so much.

Question 8

Ah, actually, have just added another revision not using .length (keeping track of the length by incrementing a var instead) and that made quite a bit of difference jsperf.com/compare-array-unique-versions/4

Question 9

Here's my version, however there are three major downsides.

Requires more memory.
Only works with primitive datatypes or objects with a string method that returns unique values. In other words, this doesn't work with objects or object literals.
Harder to read and maintain

Code:

Array.prototype.getUnique = function () {
 var arr = this;
 var newArr = [],
 i = 0,
 j = 0,
 obj = {},
 len = arr.length;
 while (len--) {
 if (!obj[arr[i]]) {
 obj[arr[i]] = 1;
 newArr[j] = arr[i];
 j++;
 }
 i++;
 }
 return newArr;
};

Demo here: http://jsperf.com/compare-array-unique-versions/3

Update

Here's the same code but revised to make it easier to read.

Array.prototype.getUnique_simple = function () {
 var arr = this, newArr = [], obj = {};
 for(var i = 0, len = arr.length; i < len; i++){
 if (obj[arr[i]]) {
 continue;
 }
 obj[arr[i]] = 1;
 newArr.push(arr[i]);
 }
 return newArr;
};

Pebbl Pebbl 1613 silver badges11 bronze badges · Answer 1 · 2012-09-12 13:26:02Z

For me, I'd always avoid methods that require lots of includes to things get working - and in my mind, the more code used.. the slower things will be (unless some form of caching is used). Which is why I would opt for a simple JavaScript solution. Your idea will work, but I think this one is faster:

Array.prototype.unique = function () {
 var a = this, b = [], c, i = a.length;
 again: while ( i-- ) {
 c = a[i];
 k = i; while( k-- ){ if (a[k] == c){ continue again; } }
 b.unshift( a[i] );
 }
 return b;
}

There are probably other improvements that can be made, for example it might be faster to find a way to use .push() rather than .unshift().

I haven't tested the above excessively, but it seems to work in all my tests so far. The reason why it gets a speed increase is because it reduces the area it is checking each time; it is also using subtle other speed increases like a decrementing while loop (means there are less conditional statements to check on each iteration), and creating shortcut vars that cut down access time.

As proof here is a jsPerf... albeit only tested on my set-up so far ;)

http://jsperf.com/compare-array-unique-versions

side note: -- the downside to my method is that it will only include the last found occurance of a duplicate (not the first as your's will). So if that ordering is important to you, then you'll have to refactor the code.

revision: -- after a few jsPerfs it seems clear that the while(i--) no longer holds a speed difference (at least not for FireFox 16 Mac OSX). Whilst on Chrome Mac OSX i--; seems slower than i++;

http://jsperf.com/compare-a-dec-while-against-a-for-loop

So taking in to account BillyBarry's comments the improved version should be:

Array.prototype.unique8 = function () {
 var a = this, b = [], c, i, l = a.length, j, k = 0;
 again: for ( i = 0; i < l; i++ ) {
 c = a[i];
 for ( j = 0; j < k; j++ ) { if (b[j] === c){ continue again; } }
 b[k++] = c;
 }
 return b;
}

Working from b, rather than a improves things quite a lot. Plus using k++; rather than .length for the internal loop makes quite a bit of difference for FireFox (Mac OSX) but has no affect on Chrome.

It would be slightly faster to use b.push() than unshift, but it will be significantly faster to change the inner loop to loop over b instead of a if you know that it has many repetitions. see: jsperf.com/compare-array-unique-versions/2
+1 Yep, I was pretty certain .push() would be better - and from looking at your jsPerf I'm surprised the for loop version seems to win out (I guess that while trick is no longer true in recent interpreters - need to refresh my assumptions ;). Good point about interating over b instead - I was trying to cut down on variable usage and length checks but it seems these don't impact so much.
Ah, actually, have just added another revision not using .length (keeping track of the length by incrementing a var instead) and that made quite a bit of difference jsperf.com/compare-array-unique-versions/4

Larry Battle Larry Battle 2,17811 silver badges19 bronze badges · Answer 2 · 2012-09-12 19:27:28Z

Here's my version, however there are three major downsides.

Requires more memory.
Only works with primitive datatypes or objects with a string method that returns unique values. In other words, this doesn't work with objects or object literals.
Harder to read and maintain

Code:

Array.prototype.getUnique = function () {
 var arr = this;
 var newArr = [],
 i = 0,
 j = 0,
 obj = {},
 len = arr.length;
 while (len--) {
 if (!obj[arr[i]]) {
 obj[arr[i]] = 1;
 newArr[j] = arr[i];
 j++;
 }
 i++;
 }
 return newArr;
};

Demo here: http://jsperf.com/compare-array-unique-versions/3

Update

Here's the same code but revised to make it easier to read.

Array.prototype.getUnique_simple = function () {
 var arr = this, newArr = [], obj = {};
 for(var i = 0, len = arr.length; i < len; i++){
 if (obj[arr[i]]) {
 continue;
 }
 obj[arr[i]] = 1;
 newArr.push(arr[i]);
 }
 return newArr;
};

Stack Exchange Network

Optimize/Refactor Javascript Unique Array function

2 Answers 2

Code:

Update

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Optimize/Refactor Javascript Unique Array function

2 Answers 2

Code:

Update

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions