I have a set of numbers say : 1 1 2 8 5 6 6 7 8 8 4 2...
I want to detect the duplicate element in subsets(of given size say k) of the above numbers... For example : Consider the increasing subsets(for example consider k=3)
Subset 1 :{1,1,2}
Subset 2 :{1,2,8}
Subset 3 :{2,8,5}
Subset 4 :{8,5,6}
Subset 5 :{5,6,6}
Subset 6 :{6,6,7}
....
.... So my algorithm should detect that subset 1,5,6 contains duplicates..
My approach : 1)Copy the 1st k elements to a temporary array(vector) 2) using #include file in C++ STL...using unique() I would determine if there's any change in size of vector.. Any other clue how to approach this problem..
-
What's your approach so far?yannis– yannis2012年10月14日 14:24:50 +00:00Commented Oct 14, 2012 at 14:24
-
My approach : 1)Copy the 1st k elements to a temporary array(vector) 2) using #include<algorithm> file in C++ STL...using unique() I would determine if there's any change in size of vector..Abhinav Shrivastava– Abhinav Shrivastava2012年10月14日 14:28:00 +00:00Commented Oct 14, 2012 at 14:28
-
Ah, so you have a clue on how to approach the problem after all! ;) Please add that to the question itself, comments aren't that visible.yannis– yannis2012年10月14日 14:29:38 +00:00Commented Oct 14, 2012 at 14:29
2 Answers 2
Take a subset, add each element into a hash table. If the hash table already contains a value - print out saying there is a duplicate.
A hash table will only allocate 1 memory block per entry. Hence its easy to find if the number already exists. Since these are just regular entries your table will be something like this:
struct Hashtable{
int number;
};
static struct Hashtable hashTable[10];
int getHash(int x){ return x; }
void addHash(int number)
{
if(hashTable[number].number == -1)
{ /*OK*/
hashTable[number].number = number;
}else{
printf("DUPLICATE FOUND! BAILING OUT!");
}
}
void initTable(){ for(int i = 0; i < 10; i++){hashTable[i].number = -1;}}
-
Your "pseudo-code" isn't really good. If it's supposed to be C or C++, relying on "some garbage value" for anything is just wrong. Using a global, fixed-size static thing looks inappropriate. Simpler pseudo-code would be more interesting than what you have up there IMO.Mat– Mat2012年10月14日 15:01:10 +00:00Commented Oct 14, 2012 at 15:01
-
1The psuedo code was suppose to be for explanation. You can use all the techniques you want to make the code cleaner.Aniket Inge– Aniket Inge2012年10月14日 15:03:01 +00:00Commented Oct 14, 2012 at 15:03
-
I know it's not supposed to be working code, but as it is it's close to being compilable but really bad C or C++. A simple
if (hash.contains(value)) return DUPLICATE; else hash.put(value);
even if it's not proper C or C++ or Java would illustrate what you're trying to describe better.Mat– Mat2012年10月14日 15:04:48 +00:00Commented Oct 14, 2012 at 15:04 -
1I would suggest setting all the values of the table to -1 and then if(hashTable[number].number == -1){/*OK NOW*/ ...} else { duplicateFound}Aniket Inge– Aniket Inge2012年10月14日 15:07:30 +00:00Commented Oct 14, 2012 at 15:07
If they're all sets then by definition they don't contain duplicates. If they're eg. lists then you could just make sets of the sublists. In Python 2.x:
items = [1, 1, 2, 8, 5, 6, 6, 7, 8, 8, 4, 2]
for i in xrange(len(items) - 2):
sublist = items[i:i + 3]
if len(set(sublist)) < 3:
print sublist, 'contains duplicates.'
else:
print sublist, 'contains no duplicates.'
Result:
[1, 1, 2] contains duplicates.
[1, 2, 8] contains no duplicates.
[2, 8, 5] contains no duplicates.
[8, 5, 6] contains no duplicates.
[5, 6, 6] contains duplicates.
[6, 6, 7] contains duplicates.
[6, 7, 8] contains no duplicates.
[7, 8, 8] contains duplicates.
[8, 8, 4] contains duplicates.
[8, 4, 2] contains no duplicates.
-
Similar to my approach in C++..thanx for the help... 1)Copy the 1st k elements to a temporary array(vector) 2) using #include<algorithm> file in C++ STL...using unique() I would determine if there's any change in size of vectorAbhinav Shrivastava– Abhinav Shrivastava2012年10月14日 14:29:38 +00:00Commented Oct 14, 2012 at 14:29