I have written code to find a duplicate char from a string. Can the code be improved?
Input:
Hello World
output :
l = 3 its a dup
o = 2 its a dup
My approach is
step 1: sort the string
step 2: remove all the whitespace
step 3: copy std::string
to char []
and loop through the array until different char and count on every dup char if count is more than one its duplicate.
#include<iostream>
#include<algorithm>
#include<set>
#include<string>
#include<string.h>
#include<unistd.h>
void dupcheck(std::string usr)
{
int count =1;
char temp[1024]={0};
strncpy(temp,usr.c_str(),sizeof(temp));
for(int i=0;temp[i];++i)
{
for(int j=i+1;temp[j]==temp[i];++j)
{
if(temp[i]==temp[j])
count++;
i=j;
}
if(count>=2)
{
std::cout<<"dup char ="<<temp[i]<<" "<<count<<std::endl;
count =1;
}
}
}
int main()
{
std::string userInput;
std::getline(std::cin,userInput);
userInput.erase(std::remove(userInput.begin(), userInput.end(),' '), userInput.end());
std::sort(userInput.begin(),userInput.end());
std::cout<<userInput<<std::endl;
if(!userInput.empty())
dupcheck(userInput);
}
Please help me to improve the code if possible.
2 Answers 2
Copying the string in step 3 is unnecessary
You can iterate through the characters of a std::string s
using std::string::operator[]
and std::string::size()
:
for (std::string::size_type pos = 0; pos < s.size(); pos++) {
char c = s[pos]; // do something with the char now
}
Alternatively, you can use std::string::iterator
to iterate through the string.
Avoid std::endl
in favor of \n
std::endl
flushes the stream, which can cause a loss in performance.
Use more whitespace in your code
It's difficult to read code like for(int i=0;temp[i];++i)
. Instead, use more whitespace as in for (int i = 0; temp[i]; ++i)
.
I would take a different approach to solve the problem. With your approach, you have to iterate through the string at least once to sort it (step 1), then again to remove the whitespace (step 2), then a third time to look for duplicates. If you copy the string in step 3 that's yet another iteration.
You only need to iterate through the string once. Here's one way to do that:
Iterate through the string's characters (skipping the whitespace characters) and put each character in a std::set<char>
if you haven't encountered it before. If you've already encountered it (it's in the std::set
) put it in a separate std::set
of duplicates. Once you've iterated through the string's character's once, you can iterate through the set of duplicates to print them.
Here's a demo:
#include <iostream>
#include <string>
#include <cctype>
#include <set>
int main() {
std::string s("Hello world");
std::set<char> characters;
std::set<char> duplicates;
for (std::string::size_type pos = 0; pos < s.size(); pos++) {
char c = s[pos];
// std::isspace() accepts an int, so cast c to an int
if (!std::isspace(static_cast<int>(c))) {
if (characters.count(c) == 0) {
characters.insert(c);
} else {
duplicates.insert(c);
}
}
}
std::cout << "Duplicates: ";
for (std::set<char>::const_iterator it = duplicates.begin(); it != duplicates.end(); ++it) {
std::cout << *it << " ";
}
return 0;
}
Note that it's better to use std::isspace()
instead of ' '
to check if a character is whitespace.
If you need to count how many times each character occurs in the string, simply replace std::set<char> duplicates
with a std::map<char, int>
which maps each character to the number of times it occurs (as shown in @TobySpeight's answer).
-
1\$\begingroup\$ For the output I would use a for-range: for(const auto &duplicate : duplicates) { std::cout << duplicate << " "; } \$\endgroup\$DiDi– DiDi2017年12月07日 20:48:48 +00:00Commented Dec 7, 2017 at 20:48
-
1\$\begingroup\$ @DiDi Sure, but I don't know if the OP's compiler supports C++11. \$\endgroup\$Null– Null2017年12月07日 20:50:59 +00:00Commented Dec 7, 2017 at 20:50
-
\$\begingroup\$ @Null thank you its is clean and easy to understand :-) but i guss you missed to get no of repeated character.As show in the question example. \$\endgroup\$rim– rim2017年12月08日 15:17:23 +00:00Commented Dec 8, 2017 at 15:17
-
\$\begingroup\$ @Ysurf I went by the title of simply "finding" duplicates. It's not too difficult to modify my example to count them (@TobySpeight's answer shows you how) -- the important thing is to avoid iterating over the input string so many times. \$\endgroup\$Null– Null2017年12月08日 15:21:48 +00:00Commented Dec 8, 2017 at 15:21
-
1\$\begingroup\$ @Ysurf I added a note about counting the number of occurrences. Thanks for accepting! \$\endgroup\$Null– Null2017年12月08日 15:27:38 +00:00Commented Dec 8, 2017 at 15:27
Include only what you need
This program uses only C++ Standard Library, so we can omit
#include <unistd.h>
We're also including but not using <set>
.
Interface
dupcheck()
accepts a string by value, but only reads it. That means we should consider passing a reference to constant string, which will avoid copying the string's contents:
void dupcheck(const std::string& usr)
A significant concern of the interface is that it requires the input string to already be sorted and stripped of spaces. We could write some comments explaining that, but to me it's a sign that we want to re-think the interface. It's much more difficult to understand code where the work is spread between the caller and the function.
Avoid fixed-size storage
char temp[1024]={0}; strncpy(temp,usr.c_str(),sizeof(temp));
This is another constraint that's not apparent to users - if a string of 1024 characters or longer is passed as input, then the contents of temp
will not be null-terminated, and that results in undefined behaviour.
As an aside, initialising temp
with zeros may be considered good defensive programming, but I think it's wasteful when we're going to immediately overwrite them - I prefer to use tools such as Valgrind to prevent use of uninitialised memory.
Be careful with braces
GCC warns that the indentation is misleading here:
if(temp[i]==temp[j]) count++; i=j;
I don't know whether you meant { count++; i=j; }
instead.
Don't write to cout
in functions
Instead of writing to std::cout
, we could accept a std::ostream&
parameter, or we could return a suitable data structure containing the results, and allow the caller to decide what and how to print it.
Alternative method
We can use a standard container as a "bag" (also known as a "multiset"), and populate it directly from the string. Then remove from the contents any character that appears only once.
#include <algorithm>
#include <map>
#include <string>
// Helper function cribbed from https://stackoverflow.com/a/29004221
template<typename Container, typename Predicate>
void erase_if(Container& items, const Predicate& predicate) {
for (auto it = items.begin(); it != items.end(); ) {
if (predicate(*it))
it = items.erase(it);
else
++it;
}
}
template<typename T>
using bag = std::map<T,int>;
bag<char> get_dupes(const std::string& s)
{
bag<char> values;
for (auto c: s)
++values[c];
// remove spaces and non-duplicate characters
erase_if(values, [](const auto& e){ return e.first==' ' || e.second < 2;});
return values;
}
// Test code
#include <ostream>
std::ostream& print_dupes(std::ostream& os, const bag<char>& dupes)
{
const char *sep = "";
for (const auto& e: dupes) {
os << sep << e.first << ":" << e.second;
sep = ", ";
}
return os << std::endl;
}
#include <iostream>
int main()
{
for (auto s: { "Hello World!", "foobar" })
print_dupes(std::cout << s << " has these duplicates: ",
get_dupes(s));
}