Detect duplicate in a subset from a set of elements

Question 1

I have a set of numbers say : 1 1 2 8 5 6 6 7 8 8 4 2...

I want to detect the duplicate element in subsets(of given size say k) of the above numbers... For example : Consider the increasing subsets(for example consider k=3)

Subset 1 :{1,1,2}

Subset 2 :{1,2,8}

Subset 3 :{2,8,5}

Subset 4 :{8,5,6}

Subset 5 :{5,6,6}

Subset 6 :{6,6,7}

....

.... So my algorithm should detect that subset 1,5,6 contains duplicates..

My approach : 1)Copy the 1st k elements to a temporary array(vector) 2) using #include file in C++ STL...using unique() I would determine if there's any change in size of vector.. Any other clue how to approach this problem..

Question 2

What's your approach so far?

Question 3

My approach : 1)Copy the 1st k elements to a temporary array(vector) 2) using #include<algorithm> file in C++ STL...using unique() I would determine if there's any change in size of vector..

Question 4

Ah, so you have a clue on how to approach the problem after all! ;) Please add that to the question itself, comments aren't that visible.

Question 5

Take a subset, add each element into a hash table. If the hash table already contains a value - print out saying there is a duplicate.

A hash table will only allocate 1 memory block per entry. Hence its easy to find if the number already exists. Since these are just regular entries your table will be something like this:

struct Hashtable{
 int number;
};
static struct Hashtable hashTable[10];
int getHash(int x){ return x; }
void addHash(int number)
{
 if(hashTable[number].number == -1)
 { /*OK*/
 hashTable[number].number = number;
 }else{
 printf("DUPLICATE FOUND! BAILING OUT!");
 }
}
void initTable(){ for(int i = 0; i < 10; i++){hashTable[i].number = -1;}}

Question 6

Your "pseudo-code" isn't really good. If it's supposed to be C or C++, relying on "some garbage value" for anything is just wrong. Using a global, fixed-size static thing looks inappropriate. Simpler pseudo-code would be more interesting than what you have up there IMO.

Question 7

The psuedo code was suppose to be for explanation. You can use all the techniques you want to make the code cleaner.

Question 8

I know it's not supposed to be working code, but as it is it's close to being compilable but really bad C or C++. A simple if (hash.contains(value)) return DUPLICATE; else hash.put(value); even if it's not proper C or C++ or Java would illustrate what you're trying to describe better.

Question 9

I would suggest setting all the values of the table to -1 and then if(hashTable[number].number == -1){/*OK NOW*/ ...} else { duplicateFound}

Question 10

If they're all sets then by definition they don't contain duplicates. If they're eg. lists then you could just make sets of the sublists. In Python 2.x:

items = [1, 1, 2, 8, 5, 6, 6, 7, 8, 8, 4, 2]
for i in xrange(len(items) - 2):
 sublist = items[i:i + 3]
 if len(set(sublist)) < 3:
 print sublist, 'contains duplicates.'
 else:
 print sublist, 'contains no duplicates.'

Result:

[1, 1, 2] contains duplicates.
[1, 2, 8] contains no duplicates.
[2, 8, 5] contains no duplicates.
[8, 5, 6] contains no duplicates.
[5, 6, 6] contains duplicates.
[6, 6, 7] contains duplicates.
[6, 7, 8] contains no duplicates.
[7, 8, 8] contains duplicates.
[8, 8, 4] contains duplicates.
[8, 4, 2] contains no duplicates.

Question 11

Similar to my approach in C++..thanx for the help... 1)Copy the 1st k elements to a temporary array(vector) 2) using #include<algorithm> file in C++ STL...using unique() I would determine if there's any change in size of vector

Aniket Inge Aniket Inge 9167 silver badges14 bronze badges · Accepted Answer · 2012-10-14 14:32:50Z

1

Take a subset, add each element into a hash table. If the hash table already contains a value - print out saying there is a duplicate.

A hash table will only allocate 1 memory block per entry. Hence its easy to find if the number already exists. Since these are just regular entries your table will be something like this:

struct Hashtable{
 int number;
};
static struct Hashtable hashTable[10];
int getHash(int x){ return x; }
void addHash(int number)
{
 if(hashTable[number].number == -1)
 { /*OK*/
 hashTable[number].number = number;
 }else{
 printf("DUPLICATE FOUND! BAILING OUT!");
 }
}
void initTable(){ for(int i = 0; i < 10; i++){hashTable[i].number = -1;}}

Share

Improve this answer

edited Oct 14, 2012 at 15:09

answered Oct 14, 2012 at 14:32

Aniket Inge's user avatar

Aniket Inge Aniket Inge

9167 silver badges14 bronze badges

4

Your "pseudo-code" isn't really good. If it's supposed to be C or C++, relying on "some garbage value" for anything is just wrong. Using a global, fixed-size static thing looks inappropriate. Simpler pseudo-code would be more interesting than what you have up there IMO.

Mat
– Mat

2012年10月14日 15:01:10 +00:00
Commented Oct 14, 2012 at 15:01
1

The psuedo code was suppose to be for explanation. You can use all the techniques you want to make the code cleaner.

Aniket Inge
– Aniket Inge

2012年10月14日 15:03:01 +00:00
Commented Oct 14, 2012 at 15:03
I know it's not supposed to be working code, but as it is it's close to being compilable but really bad C or C++. A simple if (hash.contains(value)) return DUPLICATE; else hash.put(value); even if it's not proper C or C++ or Java would illustrate what you're trying to describe better.

Mat
– Mat

2012年10月14日 15:04:48 +00:00
Commented Oct 14, 2012 at 15:04
1

I would suggest setting all the values of the table to -1 and then if(hashTable[number].number == -1){/*OK NOW*/ ...} else { duplicateFound}

Aniket Inge
– Aniket Inge

2012年10月14日 15:07:30 +00:00
Commented Oct 14, 2012 at 15:07

Add a comment |

Stack Exchange Network

Detect duplicate in a subset from a set of elements

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Detect duplicate in a subset from a set of elements

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions