Below is a problem I encountered during an online course. There are a lot of similar questions asked on this, but I couldn't find a solution there as well as I believe my specific issue is a bit different from the rest.
Given a string
s
, find the first non-repeating character in the string If there is no such character print ‘@’. You are required to complete the functionInput Format: String s
Output Format: First non-repeating character or @
You are supposed to implement
char char FirstNonRepeatedchar(char* str, int len)
: which takes a pointer to input string str and length of the string(including ‘0円’). The function should return either the first non repeating character or ‘@’Input Constraint: All input characters will be either lower or upper case english alphabets.
str[i] [a,z][A,Z]
There will not be any space in the input string.
/************ Prefix Code Begins*******************/
#include <iostream>
#include <string>
using namespace std;
//function finds the first non-repeating character
char FirstNonRepeatedchar(const char* , int);
int main(){
int len;
string line;
getline (cin, line);
len = int(line.length());
cout << FirstNonRepeatedchar(line.c_str(), len);
return 0;
}
/******************* Prefix Code Ends******************/
Also, note I cannot change anything in the prefix code above. I wrote 2 solutions: first one below is giving correct output.
char FirstNonRepeatedchar(const char* str, int len){
int count[26]={0};
for(int i=0;i<len;i++)
{
count[*(str+i)]++;
}
for(int i=0;i<len;i++)
{
if(count[*(str+i)]==1)
return *(str+i);
}
return '@';
}
The above solution iterates the string twice. So, I tried to improve the complexity by implementing it using a hash.
char FirstNonRepeatedchar(const char* str, int len){
int count[26]={0};
int j;
char c;
if(isupper(*str)==1)
{ c='A';
}
else
c='a';
for(int i=0;i<len;i++)
{ j=*(str+i)-c;
if(count[j]<0)
;
else if(count[j]==0)
count[j]=i+1;
else if(count[j]>0)
count[j]=-i;
}
int min=len+1;
for(int i=0;i<26;i++)
{
if(count[i]>0 && count[i]<min)
min=count[i];
}
if(min==len+1)
return '@';
else
return *(str+min-1);
}
Is the second solution I gave the most optimized solution possible for this problem?
2 Answers 2
Let's try a solution that doesn't use an array with all the required bookkeeping. The following code works, and bypasses the problems others have pointed out.
First, a quick template function that searches for duplicates given a char:
template <typename T>
inline bool repeated(char c, T first, T last) {
size_t count = 0;
while (first != last) {
if (*first == c) ++count;
if (count > 1) return true;
++first;
}
return false;
}
Now we create a set of unique characters and then scan the string to see if there are any duplicates. If there are, we remove the char from the set and continue to the next one.
char FirstNonRepeatedchar(const char* str, int len) {
auto uniq = std::set<char>(str, str + len);
auto first = str;
auto last = str + len;
while (first != last) {
if (auto it = uniq.find(*first) != uniq.end()) {
if (!repeated(*first, str, str + len)) return *first;
uniq.erase(it);
}
++first;
}
return '@';
}
The first solution
Your "fixed" solution still won't work,
because the last valid index in the count
array is 25,
but *(str+i)
will have as indexes as ASCII code of 'a'..'z'
or 'A'..'Z'
, all of which are blatantly out of range.
A program that tries to access invalid memory locations would usually crash,
if it doesn't, that's just plain lucky,
and depends on your environment and compiler.
There are several other problems:
- No handling of uppercase / lowercase input
- Too tight writing style (no spaces around operators)
- Inconsistent and poor indenting
- It's recommended to use braces even with single-line
if
statements
With the above fixed, and some other improvements:
char FirstNonRepeatedchar(const char* str, int len) {
int count['z' - 'a' + 1] = {0};
for (int i = 0; i < len; i++)
{
int index = tolower(*(str+i)) - 'a';
++count[index];
}
for (int i = 0; i < len; i++)
{
char c = *(str+i);
int index = tolower(c) - 'a';
if (count[index] == 1)
{
return c;
}
}
return '@';
}
Another tiny improvement is that instead of 26,
the size of the English alphabet,
I used 'z' - 'a' + 1
to make the logic behind the counting array explicit.
The second solution
This solution is so poorly formatted, it's bordering on deliberate obfuscation.
Is the second solution I gave the most optimized solution possible for this problem?
Honestly, it's hard to care to review code that you didn't seem to care to write in a readable way.
But looking from afar, I can tell you this:
- The first solution has \$O(N) + O(N) = O(N)\$ complexity, because in the worst case it iterates over all characters in the input
- The second solution, if it actually works, would have \$O(N) + O(M) = O(N)\$ complexity, where \$M\$ is a fixed constant independ from the length of the input (= 26, the size of the English alphabet), so this would have the better performance
-
\$\begingroup\$ \$O(N)+O(N)=O(N)\$ and also \$O(N)+O(M)=O(N)\$ \$\endgroup\$mustafa– mustafa2015年10月23日 06:29:26 +00:00Commented Oct 23, 2015 at 6:29
count
is uninitialised. In your second implementation, upper case letters cause array access out of bounds. Please ensure your code is working before asking for review. \$\endgroup\$toupper()
ortolower()
would fix it though. \$\endgroup\$count
. You need to usemalloc
ornew
, or just use an array. \$\endgroup\$