I am a new programmer and I want to know if I'm inheriting the right styles. I don't want to pick up any bad/amateur habits. Any advice would be much appreciated.
This is an isUnique
function I wrote in C.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
int x;
char s1[] = "dog hi cobe!";
x = isUnique(s1);
printf("%d", x);
return 0;
}
isUnique(char* s1)
{
char holder;
int i;
int j;
if(s1 == NULL)
return -1;
for(i = 0; i < (strlen(s1) -1); i++)
{
holder = s1[i];
for(j = i; j < (strlen(s1)-1); j++)
{
if(holder == s1[j+1])
{
if(holder == ' ')
continue;
else
return 0;
}
}
}
return 1;
}
And I'm going to say the run time complexity is \$O(n^2)\,ドル would I be correct?
3 Answers 3
Missing return type
This function declaration is missing the return type:
isUnique(char* s1)
Sure, the default return type is int
in C,
but it would be better to just write it anyway for clarity.
Use boolean
C99 has a bool
type with true
and false
values,
which you can get from this #include
:
#include <stdbool.h>
It would be better to rewrite isUnique
to return a bool
.
Even though, this will mean giving up the special value -1 for the NULL
input case. But that doesn't seem so bad. A name like "is*" is commonly expected to return a boolean. If you really need a function that can handle the NULL
case, you could create a wrapper function:
/* returns -1 on NULL input, 0 if not unique, 1 if unique */
int checkUnique(char* s1) {
if (s1 == NULL) return -1;
return isUnique(s1);
}
Note that the bool
return value of isUnique
will be converted to int
, so that false
is 0 and true
is 1.
Avoid repeated calculations
You call strlen(s1)
twice:
for(i = 0; i < (strlen(s1) -1); i++) { holder = s1[i]; for(j = i; j < (strlen(s1)-1); j++)
It would be better to call it once, store in a variable and reuse.
Time complexity
As you suspected, the time complexity of your algorithm is \$O(n^2)\$: all characters are compared with all other.
You could improve that to \$O(n)\$ by using extra storage, the size of the alphabet.
You could create an array of bool
, representing if a character was already seen or not:
- All values are initialized to
false
- For character in the input, check if it was seen
- If it was seen, return
false
- If it was not seen, mark it now (set value in the
bool
array totrue
)
- If it was seen, return
- If the end of the string is reached, return
true
Something like this:
bool isUnique(char * string) {
int len = strlen(string);
bool seen[256];
memset(seen, false, sizeof(seen));
int i;
for (i = 0; i < len; i++) {
char c = string[i];
if (seen[c]) {
return false;
}
seen[c] = true;
}
return true;
}
Printing the result
This will print a 0 or 1:
int x; char s1[] = "dog hi cobe!"; x = isUnique(s1); printf("%d", x);
It would be more informative to print the strings that was tested, and "false"/"true" instead of 0/1:
char s1[] = "dog hi cobe!";
printf("%s -> %s\n", s1, isUnique(s1) ? "true" : "false");
Usability
It would be more interesting if the program took the strings from the command line, instead of using hardcoded values, for example:
void printWithResult(char* string) {
printf("%s -> %s\n", string, isUnique(string) ? "true" : "false");
}
int main(int argc, char ** argv) {
int i;
for (i = 1; i < argc; i++) {
printWithResult(argv[i]);
}
}
-
\$\begingroup\$ Wow! I learned so much through that! Thank you so much!! \$\endgroup\$user85591– user855912015年12月28日 19:06:16 +00:00Commented Dec 28, 2015 at 19:06
-
\$\begingroup\$ Also, I'm don't quite understand the time complexity reduction method you described. Can you please give me further clarification/example? \$\endgroup\$user85591– user855912015年12月28日 19:27:37 +00:00Commented Dec 28, 2015 at 19:27
-
\$\begingroup\$ Lastly, to pass in the arguments from the command line, why are we also creating an int parameter and why is the second parameter a double pointer rather than a single pointer? \$\endgroup\$user85591– user855912015年12月28日 19:30:13 +00:00Commented Dec 28, 2015 at 19:30
-
1\$\begingroup\$ I added an example implementation of an \$O(n)\$ algorithm. As for the
int main(int argc, char ** argv)
signature, that's standard, you can learn about it in any C tutorial about command line argument parsing \$\endgroup\$janos– janos2015年12月28日 19:40:28 +00:00Commented Dec 28, 2015 at 19:40
The combination of conditions
if(holder == s1[j+1])
{
if(holder == ' ')
continue;
makes the whole inner loop void if holder == ' '
, so you may save scanning the string much earlier:
for(i = 0; i < (strlen(s1) -1); i++)
{
holder = s1[i];
if(holder == ' ')
continue;
for(j = i; j < (strlen(s1)-1); j++)
{
if(holder == s1[j+1])
{
return 0;
}
}
}
Additionally you can simplify the code a bit if you start the j
loop one char ahead:
int len = strlen(s1);
for(i = 0; i < len-1; i++)
{
holder = s1[i];
if(holder == ' ')
continue;
for(j = i+1; j < len; j++)
{
if(holder == s1[j])
{
return 0;
}
}
}
For such a relatvely small symbol set (256 possible values of char
type, max 255 of which can appear in the string) you can just arrange an array of counters:
int counter[256];
reset it to zeros:
for(i = 0; i < 256; i++)
counter[i] = 0;
or, based on the known array of int
memory layout:
memset(counter, 0, 256*sizeof(int));
and count occurences of each character (up to two)
for(i = 0; i < strlen(s1); i++)
{
if(s1[i] != ' ')
if(++counter[(unsigned char)s1[i]] > 1)
{
return 0;
}
}
return 1;
This will test your string in linear time (plus constant time for preparing counter[]
array).
-
\$\begingroup\$ by using the keyword "continue" does that make the program skip the rest of the block of code in that loop(i.e. the inner for loop)? \$\endgroup\$user85591– user855912015年12月28日 19:17:11 +00:00Commented Dec 28, 2015 at 19:17
-
\$\begingroup\$ Yes, it forces the re-iteration of a loop. More precisely, if the body of a loop is a compound statement, that is a block enclosed in
{ }
braces, then thecontinue
instruction behaves as agoto
to the closing brace. If the loop body is not a block,continue
passes the control to evaluation of the loop controlling expression in awhile
loop or to the third expression in afor
loop. \$\endgroup\$CiaPan– CiaPan2015年12月28日 23:13:40 +00:00Commented Dec 28, 2015 at 23:13
When compiling, always enable all the warnings.
(for gcc, at a minimum use: -Wall -Wextra -pedantic I also use -Wconversion -std=c99 )
the result will be:
- implicit declaration of function: isUnique
- return type defaults to 'int' for function: isUnique
- comparison between signed and unsigned integer expressions for line:
for(i = 0; i < (strlen(s1) -1); i++)
- comparison between signed and unsigned integer expressions for line:
for(j = i; j < (strlen(s1)-1); j++)
. - Note:
strlen()
returnssize_t
notint
- there is no need to call
strlen()
at all! instead use something like:for(j = i; s1[j]; j++)
which will exit the loop upon encountering the NUL string terminator byte. - when possible, data/variables should be localized to where they are used. so these lines can be eliminated:
int i; int j;
by writing the loop similar to:for(size_t j = i; s1[j]; j++)
- there is no need for the
char holder;
line. - the 'j' variable is being started with
j = i
is should bej = i+1
- Stopping the search because a space is encountered is not correct.
- do not #include headers those contents are not being used.
- use strategic comments so the reader, (and you in 6 months) will know what is being done in the code
- always use function prototypes so the compiler will not be using some 'default' values/types (and in latest C standard, the defaults are changed thereby causing the compiler to raise warnings)
Here is a suggested implementation
#include <stdio.h>
// prototypes
int isUnique(char* s1);
int main( void )
{
int x;
char *s1 = "dog hi cobe!";
x = isUnique(s1);
printf("%d", x);
return 0;
} // end function: main
int isUnique(char* s1)
{
if( !s1 )
{
// indicate no string passed
return -1;
}
for( size_t i = 0; s1[i]; i++ )
{
for( size_t j = i+1; s1[j]; j++ )
{
if( s1[i] == s1[j] )
{
// indicate characters are not unique
return 0;
}
}
}
// indicate characters are unique
return 1;
} // end function: isUnique
-
\$\begingroup\$ Thank you for all those helpful suggestions! I had one quick question. For your 6th suggestion, how exactly is the for loop behaving by writing s1[j] as the second parameter into the for loop? I am asking this for references (b/c I do not see a conditional statement). What exactly would is the implicit condition stated there? \$\endgroup\$user85591– user855912015年12月31日 01:28:32 +00:00Commented Dec 31, 2015 at 1:28
-
1\$\begingroup\$ the second parameter of a
for()
statement only looks at the result as being 'true' (non zero) or 'false' (zero) so the 's1[j]' (and 's1[j]) is looking at the byte selected by the parameter: 's1[i]'. If that byte contains '0円' then the condition is 'false' otherwise it is 'true'. when the 'condition' is 'false', then the loop exits. \$\endgroup\$user3629249– user36292492015年12月31日 08:03:50 +00:00Commented Dec 31, 2015 at 8:03 -
1\$\begingroup\$ here is a quick summary of the syntax for the
for(expresion1; expression2; expression3)
statement: expression1 - Initialisese variables. expression2 - Condtional expression, as long as this condition is true, loop will keep executing. expression3 - expression3 is the modifier which may be simple increment of a variable. \$\endgroup\$user3629249– user36292492015年12月31日 08:08:05 +00:00Commented Dec 31, 2015 at 8:08