I am currently reading The C Programming Language by Dennis Richie and Brian Kernighan, and I came to implement the function squeeze (s1, s2)
as an exercise. squeeze()
deletes each character in s1
that matches any character in the string s2
. While coding squeeze()
, I thought that I can extend its capability to act like trim()
method in Java, and then the my code was already done, and tested.
However, even though it is implemented in C, I believe that the code below can be improved and optimized. Please help me do so.
char *squeeze (char *str1, char *str2) {
int i, j;
int str1Len = strlen (str1);
int toBeSubtractLen = 0;
char chr1 = '0円';
char chr2 = '0円';
for (i = 0; i < str1Len; i++) {
chr1 = str1 [i];
for (j = 0; j < strlen (str2); j++) {
chr2 = str2 [j];
if (chr1 == chr2) {
toBeSubtractLen++;
}
}
}
char *finalStr;
finalStr = malloc ((str1Len - toBeSubtractLen) + 1);
if (finalStr == NULL) {
printf ("Unable to allocate memory.\n");
exit (EXIT_FAILURE);
}
int indx = 0;
for (i = 0; i < str1Len; i++) {
chr1 = str1 [i];
for (j = 0; j < strlen (str2); j++) {
chr2 = str2 [j];
if (chr1 == chr2) {
break;
}
}
if (chr1 != chr2) {
finalStr[indx] = chr1;
indx++;
}
}
return finalStr;
} /* end of squeeze() */
Samples:
- str1 = ",AaBbCcDdEeFfGgHhIiJjKkLlMmAaAaAaNnOoPpQqRrSsTtUuVvWwXxYyZz"
- str2 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
- str3 = "abcdefghijklmnopqrstuvwxyz"
- name = "ChristopherM.Sawi"
- birthday = "August14,1988"
Outputs:
1. squeeze-ing str1 by str2 yields: "<space>,<tab>abcdefghijklmaaanopqrstuvwxyz" 2. squeeze-ing str1 by str3 yields: "<space>m<tab>ABCDEFGHIJKLMAAANOPQRSTUVWXYZ" 3. squeeze-ing str2 by str3 yields: "ABCDEFGHIJKLMNOPQRSTUVWXYZ" 4. squeeze-ing name by str3 yields: "C<space>M.<space>S" 5. squeeze-ing birthday by str1 yields: "141988"
-
\$\begingroup\$ Output 2 starts with a ',' not an 'm' \$\endgroup\$William Morris– William Morris2012年10月12日 13:57:32 +00:00Commented Oct 12, 2012 at 13:57
4 Answers 4
You want to avoid repeat iteration over the 2nd string, which is O(N). Do this N times and you have an O(N2) algorithm.
One trick you could use is to build a lookup table of the chars in str2
. This would cost O(N) and some extra memory upfront, but then subsequent lookups in the table could be brought down to O(1).
Here's a simple example. The trade-off is this takes a bit more memory and makes some assumptions about the range of char
. It also clobbers the buffer of str1
:
#include <limits.h> // for UCHAR_MAX
char *squeeze (char *str1, char *str2)
{
char present[UCHAR_MAX + 1] = {0};
char *src, *dst;
// Build lookup table of chars in str2
//
while (*str2)
present[(unsigned char)*str2++] = 1;
// Iterate through str1, removing chars along the way.
//
src = dst = str1;
while (*src)
{
// Is *src in str2?
//
if (present[(unsigned char)*src])
{
// Yes, remove this char.. (Advance src but not dst.)
//
++src;
}
else
*dst++ = *src++;
}
// NUL terminator
//
*dst = 0;
return str1;
}
Minor improvements:
chr1
andchr2
are initialized to'0円'
before being assigned, but they will always be reassigned before being read.i
andj
are not being initialized, but, similarly, will be assigned before being read. If you're going for performance optimization, consider leavingchr1
andchr2
uninitialized. If you're going for good programming practice, consider initializingi
andj
.- The loops
for (j = 0; j < strlen (str2); j++)
callstrlen
each iteration even thoughstr2
doesn't change anywhere in the function. Consider creating astr2len
variable likestr1len
to remove the unnecessary calls.
Major improvement:
- The first loop isn't necessary. It's only purpose is to determine the length of
finalStr
, but we know that a)squeeze
never returns a string longer thanstr1
, and b)indx
contains the actual length offinalStr
. Knowing this, you canmalloc
finalStr
to the size ofstr1 + 1
, thenrealloc
it toindx + 1
before returning.
Edit: finalStr
needs to have a trailing 0円
assigned to it before returning, also. Consider setting it before returning or by calling memset
after malloc
.
The solution by @asveikau is more efficient, but here is a small version, for what it is worth. Note that I have used the standard library strchr
to look for chars within strings. This is better than rolling your own loop to do the same thing and also avoids nested loops (always a bad sign).
#include <string.h>
char *
squeeze(char *s, const char *t)
{
char *out = s;
for (int i=0; s[i]; ++i) {
if (!strchr(t, s[i]))
*out++ = s[i];
}
*out = '0円';
return s;
}
It would be better to iterate over both the strings only once. Create a third temporary variable, equal to size of first string, and copy characters one-by-one if no match is found in the second string.
char *squeeze (char *destination, char *source) {
char * result = (char*) malloc (destination_length + 1);
bool found = FALSE;
int destination_length = strlen (destination);
int source_length = strlen (source);
int k = 0;
for (int i = 0; i < destination_length; i++) {
found = FALSE;
for (int j = 0; j < source_lenth; j++) {
if ( source [j] == destination [i] ) {
found = TRUE;
break;
}
}
if ( FALSE == found ) {
result [k++] = destination[i];
}
}
destination [k] = '0円';
return destination;
}
-
\$\begingroup\$ I agree with your first sentence, however I think you've actually described more or less the same algorithm @chrismsawl is using. Specifically this "if no match is found in the second string" part - implemented naively that would be another traversal of
str2
at each step. Can you share more details of how you'd do it? \$\endgroup\$asveikau– asveikau2012年10月08日 05:28:55 +00:00Commented Oct 8, 2012 at 5:28