Finding duplicate char from a string

Question 1

I have written code to find a duplicate char from a string. Can the code be improved?

Input:

 Hello World

output :

 l = 3 its a dup
 o = 2 its a dup

My approach is

step 1: sort the string

step 2: remove all the whitespace

step 3: copy std::string to char [] and loop through the array until different char and count on every dup char if count is more than one its duplicate.

#include<iostream>
#include<algorithm>
#include<set>
#include<string>
#include<string.h>
#include<unistd.h>
void dupcheck(std::string usr)
{
 int count =1;
 char temp[1024]={0};
 strncpy(temp,usr.c_str(),sizeof(temp));
 for(int i=0;temp[i];++i)
 {
 for(int j=i+1;temp[j]==temp[i];++j)
 {
 if(temp[i]==temp[j])
 count++;
 i=j;
 }
 if(count>=2)
 {
 std::cout<<"dup char ="<<temp[i]<<" "<<count<<std::endl;
 count =1;
 }
 }
}
int main()
{
 std::string userInput;
 std::getline(std::cin,userInput); 
 userInput.erase(std::remove(userInput.begin(), userInput.end(),' '), userInput.end());
 std::sort(userInput.begin(),userInput.end());
 std::cout<<userInput<<std::endl;
 if(!userInput.empty())
 dupcheck(userInput);
}

Please help me to improve the code if possible.

Question 2

Copying the string in step 3 is unnecessary

You can iterate through the characters of a std::string s using std::string::operator[] and std::string::size():

for (std::string::size_type pos = 0; pos < s.size(); pos++) {
 char c = s[pos]; // do something with the char now
}

Alternatively, you can use std::string::iterator to iterate through the string.

Avoid `std::endl` in favor of `\n`

std::endl flushes the stream, which can cause a loss in performance.

Use more whitespace in your code

It's difficult to read code like for(int i=0;temp[i];++i). Instead, use more whitespace as in for (int i = 0; temp[i]; ++i).

I would take a different approach to solve the problem. With your approach, you have to iterate through the string at least once to sort it (step 1), then again to remove the whitespace (step 2), then a third time to look for duplicates. If you copy the string in step 3 that's yet another iteration.

You only need to iterate through the string once. Here's one way to do that:

Iterate through the string's characters (skipping the whitespace characters) and put each character in a std::set<char> if you haven't encountered it before. If you've already encountered it (it's in the std::set) put it in a separate std::set of duplicates. Once you've iterated through the string's character's once, you can iterate through the set of duplicates to print them.

Here's a demo:

#include <iostream> 
#include <string>
#include <cctype>
#include <set>
int main() {
 std::string s("Hello world");
 std::set<char> characters;
 std::set<char> duplicates;
 for (std::string::size_type pos = 0; pos < s.size(); pos++) {
 char c = s[pos];
 // std::isspace() accepts an int, so cast c to an int
 if (!std::isspace(static_cast<int>(c))) {
 if (characters.count(c) == 0) {
 characters.insert(c);
 } else {
 duplicates.insert(c);
 }
 }
 }
 std::cout << "Duplicates: ";
 for (std::set<char>::const_iterator it = duplicates.begin(); it != duplicates.end(); ++it) {
 std::cout << *it << " ";
 }
 return 0;
}

Note that it's better to use std::isspace() instead of ' ' to check if a character is whitespace.

If you need to count how many times each character occurs in the string, simply replace std::set<char> duplicates with a std::map<char, int> which maps each character to the number of times it occurs (as shown in @TobySpeight's answer).

Question 3

For the output I would use a for-range: for(const auto &duplicate : duplicates) { std::cout << duplicate << " "; }

Question 4

@DiDi Sure, but I don't know if the OP's compiler supports C++11.

Question 5

@Null thank you its is clean and easy to understand :-) but i guss you missed to get no of repeated character.As show in the question example.

Question 6

@Ysurf I went by the title of simply "finding" duplicates. It's not too difficult to modify my example to count them (@TobySpeight's answer shows you how) -- the important thing is to avoid iterating over the input string so many times.

Question 7

@Ysurf I added a note about counting the number of occurrences. Thanks for accepting!

Question 8

Include only what you need

This program uses only C++ Standard Library, so we can omit

#include <unistd.h>

We're also including but not using <set>.

Interface

dupcheck() accepts a string by value, but only reads it. That means we should consider passing a reference to constant string, which will avoid copying the string's contents:

void dupcheck(const std::string& usr)

A significant concern of the interface is that it requires the input string to already be sorted and stripped of spaces. We could write some comments explaining that, but to me it's a sign that we want to re-think the interface. It's much more difficult to understand code where the work is spread between the caller and the function.

Avoid fixed-size storage

 char temp[1024]={0};
 strncpy(temp,usr.c_str(),sizeof(temp));

This is another constraint that's not apparent to users - if a string of 1024 characters or longer is passed as input, then the contents of temp will not be null-terminated, and that results in undefined behaviour.

As an aside, initialising temp with zeros may be considered good defensive programming, but I think it's wasteful when we're going to immediately overwrite them - I prefer to use tools such as Valgrind to prevent use of uninitialised memory.

Be careful with braces

GCC warns that the indentation is misleading here:

 if(temp[i]==temp[j])
 count++;
 i=j;

I don't know whether you meant { count++; i=j; } instead.

Don't write to `cout` in functions

Instead of writing to std::cout, we could accept a std::ostream& parameter, or we could return a suitable data structure containing the results, and allow the caller to decide what and how to print it.

Alternative method

We can use a standard container as a "bag" (also known as a "multiset"), and populate it directly from the string. Then remove from the contents any character that appears only once.

#include <algorithm>
#include <map>
#include <string>
// Helper function cribbed from https://stackoverflow.com/a/29004221
template<typename Container, typename Predicate>
void erase_if(Container& items, const Predicate& predicate) {
 for (auto it = items.begin(); it != items.end(); ) {
 if (predicate(*it))
 it = items.erase(it);
 else
 ++it;
 }
}
template<typename T>
using bag = std::map<T,int>;
bag<char> get_dupes(const std::string& s)
{
 bag<char> values;
 for (auto c: s)
 ++values[c];
 // remove spaces and non-duplicate characters
 erase_if(values, [](const auto& e){ return e.first==' ' || e.second < 2;});
 return values;
}
// Test code
#include <ostream>
std::ostream& print_dupes(std::ostream& os, const bag<char>& dupes)
{
 const char *sep = "";
 for (const auto& e: dupes) {
 os << sep << e.first << ":" << e.second;
 sep = ", ";
 }
 return os << std::endl;
}
#include <iostream>
int main()
{
 for (auto s: { "Hello World!", "foobar" })
 print_dupes(std::cout << s << " has these duplicates: ",
 get_dupes(s));
}

Null Null 1,4633 gold badges19 silver badges27 bronze badges · Accepted Answer · 2017-12-07 20:39:37Z

Copying the string in step 3 is unnecessary

You can iterate through the characters of a std::string s using std::string::operator[] and std::string::size():

for (std::string::size_type pos = 0; pos < s.size(); pos++) {
 char c = s[pos]; // do something with the char now
}

Alternatively, you can use std::string::iterator to iterate through the string.

Avoid `std::endl` in favor of `\n`

std::endl flushes the stream, which can cause a loss in performance.

Use more whitespace in your code

It's difficult to read code like for(int i=0;temp[i];++i). Instead, use more whitespace as in for (int i = 0; temp[i]; ++i).

I would take a different approach to solve the problem. With your approach, you have to iterate through the string at least once to sort it (step 1), then again to remove the whitespace (step 2), then a third time to look for duplicates. If you copy the string in step 3 that's yet another iteration.

You only need to iterate through the string once. Here's one way to do that:

Iterate through the string's characters (skipping the whitespace characters) and put each character in a std::set<char> if you haven't encountered it before. If you've already encountered it (it's in the std::set) put it in a separate std::set of duplicates. Once you've iterated through the string's character's once, you can iterate through the set of duplicates to print them.

Here's a demo:

#include <iostream> 
#include <string>
#include <cctype>
#include <set>
int main() {
 std::string s("Hello world");
 std::set<char> characters;
 std::set<char> duplicates;
 for (std::string::size_type pos = 0; pos < s.size(); pos++) {
 char c = s[pos];
 // std::isspace() accepts an int, so cast c to an int
 if (!std::isspace(static_cast<int>(c))) {
 if (characters.count(c) == 0) {
 characters.insert(c);
 } else {
 duplicates.insert(c);
 }
 }
 }
 std::cout << "Duplicates: ";
 for (std::set<char>::const_iterator it = duplicates.begin(); it != duplicates.end(); ++it) {
 std::cout << *it << " ";
 }
 return 0;
}

Note that it's better to use std::isspace() instead of ' ' to check if a character is whitespace.

If you need to count how many times each character occurs in the string, simply replace std::set<char> duplicates with a std::map<char, int> which maps each character to the number of times it occurs (as shown in @TobySpeight's answer).

For the output I would use a for-range: for(const auto &duplicate : duplicates) { std::cout << duplicate << " "; }
@DiDi Sure, but I don't know if the OP's compiler supports C++11.
@Null thank you its is clean and easy to understand :-) but i guss you missed to get no of repeated character.As show in the question example.
@Ysurf I went by the title of simply "finding" duplicates. It's not too difficult to modify my example to count them (@TobySpeight's answer shows you how) -- the important thing is to avoid iterating over the input string so many times.
@Ysurf I added a note about counting the number of occurrences. Thanks for accepting!

Stack Exchange Network

Finding duplicate char from a string

2 Answers 2

Copying the string in step 3 is unnecessary

Avoid `std::endl` in favor of `\n`

Use more whitespace in your code

Include only what you need

Interface

Avoid fixed-size storage

Be careful with braces

Don't write to `cout` in functions

Alternative method

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Finding duplicate char from a string

2 Answers 2

Copying the string in step 3 is unnecessary

Avoid std::endl in favor of \n

Use more whitespace in your code

Include only what you need

Interface

Avoid fixed-size storage

Be careful with braces

Don't write to cout in functions

Alternative method

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Avoid `std::endl` in favor of `\n`

Don't write to `cout` in functions