Please bare with me, I want this to be as language agnostic as possible becuase of the languages I am working with (One of which is a language called PowerOn). However, most languanges support for loops and arrays.
Say I have the following list in an aray:
0x 0 Foo
1x 1 Bar
2x 0 Widget
3x 1 Whatsit
4x 0 Foo
5x 1 Bar
Anything with a 1 should be uniqely added to another array with the following result:
0x 1 Bar
1x 1 Whatsit
Keep in mind this is a very elementry example. In reality, I am dealing with 10's of thousands of elements on the old list. Here is what I have so far.
Pseudo Code:
For each element in oldlist
For each element in newlist
Compare
If values oldlist.element equals newlist.element, break new list loop
If reached end of newlist with with nothing equal from oldlist, add value from old list to new list
End
End
Is there a better way of doing this? Algorithmicly, is there any room for improvement? And as a bonus qeustion, what is the O notation for this type of algorithm (if there is one)?
-
What's tons comparisons to you? Assuming your array is of size N. Is N comparisons a bad thing? How about N^2 or N^N? A single nested loop is N^2, which might not be too bad. N*log(N) would be good too, but not N^N.CodeART– CodeART2012年11月01日 21:34:41 +00:00Commented Nov 1, 2012 at 21:34
-
@codework probably closer to (M*N)^2 where M is a subset of NChad Harrison– Chad Harrison2012年11月02日 12:43:23 +00:00Commented Nov 2, 2012 at 12:43
3 Answers 3
If you keep your newlist
sorted, then you can use binary search to determine whether a duplicate exists. This will only make a difference when your newlist
starts to get large, so if it never gets longer than (say) 64 entries you may not see any improvement. It may also not work if it's expensive to insert into the middle of an array -- shifting all the elements may eat up a lot of your time savings. You could mitigate this somewhat by keeping multiple lists, maybe your starting list and a separate list of the elements you've added. You'll have to search each separately but with binary search that should be manageable. Then, after your loops have finished, merge the new elements into the second list.
Another improvement might be to traverse your oldlist
after inserting a new value into newlist
and removing any duplicates. Hard to say, though, without knowing anything about the data.
-
Sorting will have its own overhead.CodeART– CodeART2012年11月01日 21:33:52 +00:00Commented Nov 1, 2012 at 21:33
-
1@CodeWorks: But you're not sorting
newlist
, because it's empty at the start. You're adding elements in a sorted way. Therefore, it doesn't have it's own overhead.AlbeyAmakiir– AlbeyAmakiir2012年11月01日 21:43:54 +00:00Commented Nov 1, 2012 at 21:43 -
Hi, but how would you keep your new list sorted? Each time you add an item to that list, you need to find out where it should go.CodeART– CodeART2012年11月01日 22:06:00 +00:00Commented Nov 1, 2012 at 22:06
-
When you do a binary search, you determine where the element should go. Either it's there and the element is a duplicate or it isn't, in which case it should be inserted into the current position in the list (either before or after the element in the current position).TMN– TMN2012年11月02日 00:00:39 +00:00Commented Nov 2, 2012 at 0:00
-
Algorithmically, this is still O(n^2), because of insertions. kevin cline's answer is the only hashless one which gets you to O(n*log n). You can then make an array from the tree after the whole operation and you're done.herby– herby2012年11月03日 10:37:04 +00:00Commented Nov 3, 2012 at 10:37
Yes, it's called a hash table or map - don't know if your language has them builtin but it's easy to code. This allows you to check if an entry is already there in the same time no matter how big the list (almost).
If you need to preserve the order then you would use a sorted linked-list. Then it's easy to find (by searching) if the entry is new and a linked list lets you insert it without moving all the other entries.
-
a hash table would be nice... But don't have that as an option :(. Still +1 for keeping this in mind going forward for languages that doChad Harrison– Chad Harrison2012年11月01日 15:58:29 +00:00Commented Nov 1, 2012 at 15:58
-
3Can you not write your own hash table implementation?Ian– Ian2012年11月01日 16:16:41 +00:00Commented Nov 1, 2012 at 16:16
-
@Ian you do have it as an option, if your language is turing complete you can write a hash tableJimmy Hoffa– Jimmy Hoffa2012年11月01日 16:41:26 +00:00Commented Nov 1, 2012 at 16:41
-
@ian I'm pretty sure that PowerOn is not turing complete. Think SQLChad Harrison– Chad Harrison2012年11月01日 16:59:00 +00:00Commented Nov 1, 2012 at 16:59
-
1Yes hash collisions are normal. In the simpel case - if the index already has an entry (ie you clash) then you just increment up the table until there is an empty slot. If the table is full you resize the whole thing. When reading a hash you check if the next slot is empty - if it isn't then you also have to check that the returned value matches. ps translate pre-hash values into index IS the hash function.Martin Beckett– Martin Beckett2012年11月01日 17:44:57 +00:00Commented Nov 1, 2012 at 17:44
Organize the second array as a binary tree. It's simpler than implementing a hash table, and will run in O(n log n) time.
-
Always found hashes conceptually easier than trees - but this is probably better soln.Martin Beckett– Martin Beckett2012年11月01日 17:04:55 +00:00Commented Nov 1, 2012 at 17:04
-
1Maybe I'm being dense, but -- are you sure "binary heap" is really the data-structure you have in mind? Because searching a binary heap for a given element seems like it would require O(n) time and O(log n) space, hardly an improvement over an array. No? I wonder if you're actually thinking of a binary search tree, or more specifically, something like a red-black tree.ruakh– ruakh2012年11月01日 18:42:44 +00:00Commented Nov 1, 2012 at 18:42
-
@MartinBeckett: they are easier but I thought a hash table would be awkward in a language that only offered arrayskevin cline– kevin cline2012年11月01日 20:03:18 +00:00Commented Nov 1, 2012 at 20:03
-
@ruakh: you are right.kevin cline– kevin cline2012年11月01日 20:26:56 +00:00Commented Nov 1, 2012 at 20:26