Remove duplicates in string algorithm

Question 1

My homework is remove duplicates in a random string. My idea is use 2 loops to solve the problem.

1st one will scan every character in the string. 2nd one will check that character is duplicated or not. If so, remove the character.

string content = "Blah blah..."
 for (int i = 0; i < content.size(); ++i) {
 char check = content.at(i);
 for (int j = i + 1; j < content.size() - 1; ++j) {
 if (check == content.at(j)) {
 content.erase(content.begin()+j);
 }
 }
 }

The problem is it doesn't work. It always removes the wrong character. Seems an indices problem but I don't understand why.

A temporary fix is change content.erase(content.begin()+j); to content.erase( remove(content.begin() + i+1, content.end(), check),content.end());

But I think trigger a "remove by value" scan isn't a nice way. I want to do it with 2 loops or fewer.

Any ideas will be appreciated :)

Question 2

One simple way is to use the double-loop brute-force variant you have, but copy non-duplicates to a new string.

Question 3

Potential duplicate: stackoverflow.com/questions/2286860/…

Question 4

@JoachimPileborg: I like your idea, but can you provide some pseudo code? I can't implement it right now, no idea :(

Question 5

@djikay:thank for the link, but can you show me what's wrong in my code? I can't understand why the indices are wrong.

Question 6

@Tiana987642 See my answer that to understand what is wrong in your code.

Question 7

Your loops could look the following way

#include <iostream>
#include <string>
int main() 
{
 std::string s = "Blah blah...";
 std::cout << '\"' << s << '\"' << std::endl;
 for ( std::string::size_type i = 0; i < s.size(); i++ )
 {
 std::string::size_type j = i + 1;
 while ( j < s.size() )
 {
 if ( s[i] == s[j] )
 {
 s.erase( j, 1 );
 }
 else
 {
 ++j;
 }
 }
 }
 std::cout << '\"' << s << '\"' << std::endl;
 return 0;
}

The output is

"Blah blah..."
"Blah b."

There are many other approaches using standard algorithms. For example

#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>
int main() 
{
 std::string s = "Blah blah...";
 std::cout << '\"' << s << '\"' << std::endl;
 auto last = s.end();
 for ( auto first = s.begin(); first != last; ++first )
 {
 last = std::remove( std::next( first ), last, *first );
 }
 s.erase( last, s.end() );
 std::cout << '\"' << s << '\"' << std::endl;
 return 0;
}

The output is the same as for the previous code example

"Blah blah..."
"Blah b."

Question 8

Thank you for your suggestion :) I learnt a few thing from your code :) If you have free time, please tell me my what's wrong in my code and workaround for it, like you mentioned in your comment :)

Question 9

@Tiana987642 I think that it is enough to compare your loops with my loops in my first code example. It is obvious that this loop for (int j = i + 1; j < content.size() - 1; ++j) is invalid. It is not clear why condition content.size() - 1 is used and secondly when a character was deleted you must not increment i.

Question 10

My idea is the first loop take a character (let's call it A) , the second will compare A with the rest and delete. So the 2nd will start after the position of A and is ended at the last of the string. At the time I wrote the code I didn't think carefully about this. Just don't want to out of bound. Thank you for point out my mistake :)

Question 11

If use of STL is a possible option, you could use an std::unordered_set to keep the characters seen so far and the erase-remove idiom with std::remove_if, like in the following example:

#include <iostream>
#include <string>
#include <unordered_set>
#include <algorithm>
int main() {
 std::string str("Hello World!");
 std::unordered_set<char> log;
 std::cout << "Before: " << str << std::endl;
 str.erase(std::remove_if(str.begin(), str.end(), [&] (char const c) { return !(log.insert(c).second); }), str.end());
 std::cout << "After: " << str << std::endl;
}

LIVE DEMO

Question 12

Nice idea :) Of course use STL is an option, but I want to improve my skill a bit so I try to think an algorithm first, use <algorithm> later :) Just my opinion

Question 13

@Tiana987642 if it's for educational purposes I'm with you 100%. But when you really want to do serious programming, STL is not only highly recommended, but IMHO its use is mandatory.

Question 14

Thank, I appreciate your advice :)

Question 15

I recommend a two pass approach. The first pass identifies the positions of the duplicated characters; the second pass removes them.

I recommend using a std::set and a std::vector<unsigned int>. The vector contains letters that are in the string. The vector contains the positions of the duplicated letters.

The first pass detects if a letter is present in the set. If the letter exists, the position is appended to the vector. Otherwise the letter is inserted into the set.

For the second pass, sort the vector in descending order.
Erase the character at the position in the vector, then remove the position from the vector.

By erasing characters from the end of the string towards the front, the positions of the remaining duplicates won't change when the character is erased from the string.

Question 16

If the answer is useful, please click on the Check Mark.

Question 17

I ain't forget who helped me :) I already voted up your answer :) I just want to be fair with people, who also answer this question :D

Question 18

I am not sure that this is what is causing your problem, but another problem that I see with your code is in your second for loop. Your j < content.size() - 1 statement should just be

j < content.size().

The reasoning for this is a little tricky to see at first, but in this case you are not just getting the size of your vector to act as the size, but to act as the ending indices of your string. You are shortening the last indices by one which means you wont hit the last char in your string. I don't know if this will help your initial problem, but who knows?

Question 19

Note: Your actual problem is maintaining a proper index to the next element in question:

If you do not erase a character, the next element is at the next position.
If you erase a character, the next element will move into the place of the current position (the position stays the same).

Also: There are more efficient solutions (eg.: utilizing a set)

Question 20

Now I understand. So can you give me a workaround to avoid situation like this?

Question 21

sort first

then unique move all unique chars to the beginning and return past-the-end iterator.

then erase the useless chars

string digits("1a2b3c3c2b1a");
sort(digits.begin(), digits.end());
digits.erase( unique( digits.begin(), digits.end() ), digits.end() );
cout << digits << endl;

output

123abc

score 5 · Accepted Answer · 2014-06-30 19:53:56Z

Your loops could look the following way

#include <iostream>
#include <string>
int main() 
{
 std::string s = "Blah blah...";
 std::cout << '\"' << s << '\"' << std::endl;
 for ( std::string::size_type i = 0; i < s.size(); i++ )
 {
 std::string::size_type j = i + 1;
 while ( j < s.size() )
 {
 if ( s[i] == s[j] )
 {
 s.erase( j, 1 );
 }
 else
 {
 ++j;
 }
 }
 }
 std::cout << '\"' << s << '\"' << std::endl;
 return 0;
}

The output is

"Blah blah..."
"Blah b."

There are many other approaches using standard algorithms. For example

#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>
int main() 
{
 std::string s = "Blah blah...";
 std::cout << '\"' << s << '\"' << std::endl;
 auto last = s.end();
 for ( auto first = s.begin(); first != last; ++first )
 {
 last = std::remove( std::next( first ), last, *first );
 }
 s.erase( last, s.end() );
 std::cout << '\"' << s << '\"' << std::endl;
 return 0;
}

The output is the same as for the previous code example

"Blah blah..."
"Blah b."

Thank you for your suggestion :) I learnt a few thing from your code :) If you have free time, please tell me my what's wrong in my code and workaround for it, like you mentioned in your comment :)
@Tiana987642 I think that it is enough to compare your loops with my loops in my first code example. It is obvious that this loop for (int j = i + 1; j < content.size() - 1; ++j) is invalid. It is not clear why condition content.size() - 1 is used and secondly when a character was deleted you must not increment i.
My idea is the first loop take a character (let's call it A) , the second will compare A with the rest and delete. So the 2nd will start after the position of A and is ended at the last of the string. At the time I wrote the code I didn't think carefully about this. Just don't want to out of bound. Thank you for point out my mistake :)

CollectivesTM on Stack Overflow

Remove duplicates in string algorithm

6 Answers 6

3 Comments

LIVE DEMO

3 Comments

2 Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

6 Answers 6

3 Comments

3 Comments

2 Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related