Finding the first non-repeating character in a string

Question 1

Below is a problem I encountered during an online course. There are a lot of similar questions asked on this, but I couldn't find a solution there as well as I believe my specific issue is a bit different from the rest.

Given a string s, find the first non-repeating character in the string If there is no such character print ‘@’. You are required to complete the function

Input Format: String s

Output Format: First non-repeating character or @

You are supposed to implement char char FirstNonRepeatedchar(char* str, int len): which takes a pointer to input string str and length of the string(including ‘0円’). The function should return either the first non repeating character or ‘@’

Input Constraint: All input characters will be either lower or upper case english alphabets.
 str[i] [a,z][A,Z]
There will not be any space in the input string.

 /************ Prefix Code Begins*******************/
 #include <iostream>
 #include <string>
using namespace std;
//function finds the first non-repeating character
 char FirstNonRepeatedchar(const char* , int);
int main(){
int len;
string line;
getline (cin, line);
len = int(line.length());
cout << FirstNonRepeatedchar(line.c_str(), len);
return 0;
}
/******************* Prefix Code Ends******************/

Also, note I cannot change anything in the prefix code above. I wrote 2 solutions: first one below is giving correct output.

char FirstNonRepeatedchar(const char* str, int len){
 int count[26]={0};
 for(int i=0;i<len;i++)
 {
 count[*(str+i)]++;
 }
 for(int i=0;i<len;i++)
 {
 if(count[*(str+i)]==1)
 return *(str+i);
 }
return '@';
}

The above solution iterates the string twice. So, I tried to improve the complexity by implementing it using a hash.

char FirstNonRepeatedchar(const char* str, int len){
int count[26]={0};
int j;
char c;
if(isupper(*str)==1)
{ c='A';
}
else
c='a';
for(int i=0;i<len;i++)
{ j=*(str+i)-c;
 if(count[j]<0)
 ;
 else if(count[j]==0)
 count[j]=i+1;
 else if(count[j]>0)
 count[j]=-i;
}
int min=len+1;
for(int i=0;i<26;i++)
{ 
 if(count[i]>0 && count[i]<min)
 min=count[i];
}
if(min==len+1) 
return '@';
else
return *(str+min-1);
}

Is the second solution I gave the most optimized solution possible for this problem?

Question 2

If you iterate from the end of the string to the front, you should have your answer at the end of the loop without needing to iterate a second time.

Question 3

Your code has fatal bugs. In your first implementation, count is uninitialised. In your second implementation, upper case letters cause array access out of bounds. Please ensure your code is working before asking for review.

Question 4

The bugs mentioned above were fixed. The program is working fine now.

Question 5

Your code still doesn't work if the input is mixed-case. A simple use of toupper() or tolower() would fix it though.

Question 6

Your first code still does not allocate any memory for count. You need to use malloc or new, or just use an array.

Question 7

Let's try a solution that doesn't use an array with all the required bookkeeping. The following code works, and bypasses the problems others have pointed out.

First, a quick template function that searches for duplicates given a char:

template <typename T>
inline bool repeated(char c, T first, T last) {
 size_t count = 0;
 while (first != last) {
 if (*first == c) ++count;
 if (count > 1) return true;
 ++first;
 }
 return false;
}

Now we create a set of unique characters and then scan the string to see if there are any duplicates. If there are, we remove the char from the set and continue to the next one.

char FirstNonRepeatedchar(const char* str, int len) {
 auto uniq = std::set<char>(str, str + len);
 auto first = str;
 auto last = str + len;
 while (first != last) {
 if (auto it = uniq.find(*first) != uniq.end()) {
 if (!repeated(*first, str, str + len)) return *first;
 uniq.erase(it);
 }
 ++first;
 }
 return '@';
}

Question 8

The first solution

Your "fixed" solution still won't work, because the last valid index in the count array is 25, but *(str+i) will have as indexes as ASCII code of 'a'..'z' or 'A'..'Z', all of which are blatantly out of range. A program that tries to access invalid memory locations would usually crash, if it doesn't, that's just plain lucky, and depends on your environment and compiler.

There are several other problems:

No handling of uppercase / lowercase input
Too tight writing style (no spaces around operators)
Inconsistent and poor indenting
It's recommended to use braces even with single-line if statements

With the above fixed, and some other improvements:

char FirstNonRepeatedchar(const char* str, int len) {
 int count['z' - 'a' + 1] = {0};
 for (int i = 0; i < len; i++)
 {
 int index = tolower(*(str+i)) - 'a';
 ++count[index];
 }
 for (int i = 0; i < len; i++)
 {
 char c = *(str+i);
 int index = tolower(c) - 'a';
 if (count[index] == 1)
 {
 return c;
 }
 }
 return '@';
}

Another tiny improvement is that instead of 26, the size of the English alphabet, I used 'z' - 'a' + 1 to make the logic behind the counting array explicit.

The second solution

This solution is so poorly formatted, it's bordering on deliberate obfuscation.

Is the second solution I gave the most optimized solution possible for this problem?

Honestly, it's hard to care to review code that you didn't seem to care to write in a readable way.

But looking from afar, I can tell you this:

The first solution has \$O(N) + O(N) = O(N)\$ complexity, because in the worst case it iterates over all characters in the input
The second solution, if it actually works, would have \$O(N) + O(M) = O(N)\$ complexity, where \$M\$ is a fixed constant independ from the length of the input (= 26, the size of the English alphabet), so this would have the better performance

Question 9

\$O(N)+O(N)=O(N)\$ and also \$O(N)+O(M)=O(N)\$

Mark Beckwith Mark Beckwith 1515 bronze badges · Answer 1 · 2015-06-13 21:26:03Z

Let's try a solution that doesn't use an array with all the required bookkeeping. The following code works, and bypasses the problems others have pointed out.

First, a quick template function that searches for duplicates given a char:

template <typename T>
inline bool repeated(char c, T first, T last) {
 size_t count = 0;
 while (first != last) {
 if (*first == c) ++count;
 if (count > 1) return true;
 ++first;
 }
 return false;
}

Now we create a set of unique characters and then scan the string to see if there are any duplicates. If there are, we remove the char from the set and continue to the next one.

char FirstNonRepeatedchar(const char* str, int len) {
 auto uniq = std::set<char>(str, str + len);
 auto first = str;
 auto last = str + len;
 while (first != last) {
 if (auto it = uniq.find(*first) != uniq.end()) {
 if (!repeated(*first, str, str + len)) return *first;
 uniq.erase(it);
 }
 ++first;
 }
 return '@';
}

janos janos 113k15 gold badges154 silver badges396 bronze badges · Answer 2 · 2015-06-14 12:51:31Z

The first solution

Your "fixed" solution still won't work, because the last valid index in the count array is 25, but *(str+i) will have as indexes as ASCII code of 'a'..'z' or 'A'..'Z', all of which are blatantly out of range. A program that tries to access invalid memory locations would usually crash, if it doesn't, that's just plain lucky, and depends on your environment and compiler.

There are several other problems:

No handling of uppercase / lowercase input
Too tight writing style (no spaces around operators)
Inconsistent and poor indenting
It's recommended to use braces even with single-line if statements

With the above fixed, and some other improvements:

char FirstNonRepeatedchar(const char* str, int len) {
 int count['z' - 'a' + 1] = {0};
 for (int i = 0; i < len; i++)
 {
 int index = tolower(*(str+i)) - 'a';
 ++count[index];
 }
 for (int i = 0; i < len; i++)
 {
 char c = *(str+i);
 int index = tolower(c) - 'a';
 if (count[index] == 1)
 {
 return c;
 }
 }
 return '@';
}

Another tiny improvement is that instead of 26, the size of the English alphabet, I used 'z' - 'a' + 1 to make the logic behind the counting array explicit.

The second solution

This solution is so poorly formatted, it's bordering on deliberate obfuscation.

Is the second solution I gave the most optimized solution possible for this problem?

Honestly, it's hard to care to review code that you didn't seem to care to write in a readable way.

But looking from afar, I can tell you this:

The first solution has \$O(N) + O(N) = O(N)\$ complexity, because in the worst case it iterates over all characters in the input
The second solution, if it actually works, would have \$O(N) + O(M) = O(N)\$ complexity, where \$M\$ is a fixed constant independ from the length of the input (= 26, the size of the English alphabet), so this would have the better performance

\$\begingroup\$ \$O(N)+O(N)=O(N)\$ and also \$O(N)+O(M)=O(N)\$ \$\endgroup\$

mustafa
– mustafa

2015年10月23日 06:29:26 +00:00
Commented Oct 23, 2015 at 6:29

Stack Exchange Network

Finding the first non-repeating character in a string

2 Answers 2

The first solution

The second solution

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Finding the first non-repeating character in a string

2 Answers 2

The first solution

The second solution

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions