Is there a way to add unique items to an array without doing a ton of comparisons?

Question 1

Please bare with me, I want this to be as language agnostic as possible becuase of the languages I am working with (One of which is a language called PowerOn). However, most languanges support for loops and arrays.

Say I have the following list in an aray:

0x 0 Foo
1x 1 Bar
2x 0 Widget
3x 1 Whatsit
4x 0 Foo
5x 1 Bar

Anything with a 1 should be uniqely added to another array with the following result:

0x 1 Bar
1x 1 Whatsit

Keep in mind this is a very elementry example. In reality, I am dealing with 10's of thousands of elements on the old list. Here is what I have so far.

Pseudo Code:

For each element in oldlist
 For each element in newlist
 Compare
 If values oldlist.element equals newlist.element, break new list loop
 If reached end of newlist with with nothing equal from oldlist, add value from old list to new list
 End
End

Is there a better way of doing this? Algorithmicly, is there any room for improvement? And as a bonus qeustion, what is the O notation for this type of algorithm (if there is one)?

Question 2

What's tons comparisons to you? Assuming your array is of size N. Is N comparisons a bad thing? How about N^2 or N^N? A single nested loop is N^2, which might not be too bad. N*log(N) would be good too, but not N^N.

Question 3

@codework probably closer to (M*N)^2 where M is a subset of N

Question 4

If you keep your newlist sorted, then you can use binary search to determine whether a duplicate exists. This will only make a difference when your newlist starts to get large, so if it never gets longer than (say) 64 entries you may not see any improvement. It may also not work if it's expensive to insert into the middle of an array -- shifting all the elements may eat up a lot of your time savings. You could mitigate this somewhat by keeping multiple lists, maybe your starting list and a separate list of the elements you've added. You'll have to search each separately but with binary search that should be manageable. Then, after your loops have finished, merge the new elements into the second list.

Another improvement might be to traverse your oldlist after inserting a new value into newlist and removing any duplicates. Hard to say, though, without knowing anything about the data.

Question 5

Sorting will have its own overhead.

Question 6

@CodeWorks: But you're not sorting newlist, because it's empty at the start. You're adding elements in a sorted way. Therefore, it doesn't have it's own overhead.

Question 7

Hi, but how would you keep your new list sorted? Each time you add an item to that list, you need to find out where it should go.

Question 8

When you do a binary search, you determine where the element should go. Either it's there and the element is a duplicate or it isn't, in which case it should be inserted into the current position in the list (either before or after the element in the current position).

Question 9

Algorithmically, this is still O(n^2), because of insertions. kevin cline's answer is the only hashless one which gets you to O(n*log n). You can then make an array from the tree after the whole operation and you're done.

Question 10

Yes, it's called a hash table or map - don't know if your language has them builtin but it's easy to code. This allows you to check if an entry is already there in the same time no matter how big the list (almost).

If you need to preserve the order then you would use a sorted linked-list. Then it's easy to find (by searching) if the entry is new and a linked list lets you insert it without moving all the other entries.

Question 11

a hash table would be nice... But don't have that as an option :(. Still +1 for keeping this in mind going forward for languages that do

Question 12

Can you not write your own hash table implementation?

Question 13

@Ian you do have it as an option, if your language is turing complete you can write a hash table

Question 14

@ian I'm pretty sure that PowerOn is not turing complete. Think SQL

Question 15

Yes hash collisions are normal. In the simpel case - if the index already has an entry (ie you clash) then you just increment up the table until there is an empty slot. If the table is full you resize the whole thing. When reading a hash you check if the next slot is empty - if it isn't then you also have to check that the returned value matches. ps translate pre-hash values into index IS the hash function.

Question 16

Organize the second array as a binary tree. It's simpler than implementing a hash table, and will run in O(n log n) time.

Question 17

Always found hashes conceptually easier than trees - but this is probably better soln.

Question 18

Maybe I'm being dense, but -- are you sure "binary heap" is really the data-structure you have in mind? Because searching a binary heap for a given element seems like it would require O(n) time and O(log n) space, hardly an improvement over an array. No? I wonder if you're actually thinking of a binary search tree, or more specifically, something like a red-black tree.

Question 19

@MartinBeckett: they are easier but I thought a hash table would be awkward in a language that only offered arrays

Question 20

@ruakh: you are right.

TMN TMN 11.4k1 gold badge23 silver badges32 bronze badges · Accepted Answer · 2012-11-01 16:42:33Z

If you keep your newlist sorted, then you can use binary search to determine whether a duplicate exists. This will only make a difference when your newlist starts to get large, so if it never gets longer than (say) 64 entries you may not see any improvement. It may also not work if it's expensive to insert into the middle of an array -- shifting all the elements may eat up a lot of your time savings. You could mitigate this somewhat by keeping multiple lists, maybe your starting list and a separate list of the elements you've added. You'll have to search each separately but with binary search that should be manageable. Then, after your loops have finished, merge the new elements into the second list.

Another improvement might be to traverse your oldlist after inserting a new value into newlist and removing any duplicates. Hard to say, though, without knowing anything about the data.

@CodeWorks: But you're not sorting newlist, because it's empty at the start. You're adding elements in a sorted way. Therefore, it doesn't have it's own overhead.
Hi, but how would you keep your new list sorted? Each time you add an item to that list, you need to find out where it should go.
When you do a binary search, you determine where the element should go. Either it's there and the element is a duplicate or it isn't, in which case it should be inserted into the current position in the list (either before or after the element in the current position).
Algorithmically, this is still O(n^2), because of insertions. kevin cline's answer is the only hashless one which gets you to O(n*log n). You can then make an array from the tree after the whole operation and you're done.

Stack Exchange Network

Is there a way to add unique items to an array without doing a ton of comparisons?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Is there a way to add unique items to an array without doing a ton of comparisons?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions