Improving and optimizing squeeze() function

Question 1

I am currently reading The C Programming Language by Dennis Richie and Brian Kernighan, and I came to implement the function squeeze (s1, s2) as an exercise. squeeze() deletes each character in s1 that matches any character in the string s2. While coding squeeze(), I thought that I can extend its capability to act like trim() method in Java, and then the my code was already done, and tested.

However, even though it is implemented in C, I believe that the code below can be improved and optimized. Please help me do so.

char *squeeze (char *str1, char *str2) {
 int i, j;
 int str1Len = strlen (str1);
 int toBeSubtractLen = 0; 
 char chr1 = '0円';
 char chr2 = '0円';
 for (i = 0; i < str1Len; i++) {
 chr1 = str1 [i];
 for (j = 0; j < strlen (str2); j++) {
 chr2 = str2 [j];
 if (chr1 == chr2) {
 toBeSubtractLen++; 
 } 
 }
 }
 char *finalStr;
 finalStr = malloc ((str1Len - toBeSubtractLen) + 1);
 if (finalStr == NULL) {
 printf ("Unable to allocate memory.\n");
 exit (EXIT_FAILURE);
 }
 int indx = 0;
 for (i = 0; i < str1Len; i++) {
 chr1 = str1 [i];
 for (j = 0; j < strlen (str2); j++) {
 chr2 = str2 [j];
 if (chr1 == chr2) {
 break;
 }
 }
 if (chr1 != chr2) {
 finalStr[indx] = chr1;
 indx++;
 }
 }
 return finalStr;
} /* end of squeeze() */

Samples:

str1 = ",AaBbCcDdEeFfGgHhIiJjKkLlMmAaAaAaNnOoPpQqRrSsTtUuVvWwXxYyZz"
str2 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
str3 = "abcdefghijklmnopqrstuvwxyz"
name = "ChristopherM.Sawi"
birthday = "August14,1988"

Outputs:

1. squeeze-ing str1 by str2 yields: "<space>,<tab>abcdefghijklmaaanopqrstuvwxyz"
2. squeeze-ing str1 by str3 yields: "<space>m<tab>ABCDEFGHIJKLMAAANOPQRSTUVWXYZ"
3. squeeze-ing str2 by str3 yields: "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
4. squeeze-ing name by str3 yields: "C<space>M.<space>S"
5. squeeze-ing birthday by str1 yields: "141988"

Question 2

Output 2 starts with a ',' not an 'm'

Question 3

You want to avoid repeat iteration over the 2nd string, which is O(N). Do this N times and you have an O(N²) algorithm.

One trick you could use is to build a lookup table of the chars in str2. This would cost O(N) and some extra memory upfront, but then subsequent lookups in the table could be brought down to O(1).

Here's a simple example. The trade-off is this takes a bit more memory and makes some assumptions about the range of char. It also clobbers the buffer of str1:

#include <limits.h> // for UCHAR_MAX
char *squeeze (char *str1, char *str2)
{
 char present[UCHAR_MAX + 1] = {0};
 char *src, *dst;
 // Build lookup table of chars in str2
 //
 while (*str2)
 present[(unsigned char)*str2++] = 1;
 // Iterate through str1, removing chars along the way.
 //
 src = dst = str1;
 while (*src)
 {
 // Is *src in str2?
 //
 if (present[(unsigned char)*src])
 {
 // Yes, remove this char.. (Advance src but not dst.)
 //
 ++src;
 }
 else
 *dst++ = *src++;
 }
 // NUL terminator
 //
 *dst = 0;
 return str1;
}

Question 4

Minor improvements:

chr1 and chr2 are initialized to '0円' before being assigned, but they will always be reassigned before being read. i and j are not being initialized, but, similarly, will be assigned before being read. If you're going for performance optimization, consider leaving chr1 and chr2 uninitialized. If you're going for good programming practice, consider initializing i and j.
The loops for (j = 0; j < strlen (str2); j++) call strlen each iteration even though str2 doesn't change anywhere in the function. Consider creating a str2len variable like str1len to remove the unnecessary calls.

Major improvement:

The first loop isn't necessary. It's only purpose is to determine the length of finalStr, but we know that a) squeeze never returns a string longer than str1, and b) indx contains the actual length of finalStr. Knowing this, you can malloc finalStr to the size of str1 + 1, then realloc it to indx + 1 before returning.

Edit: finalStr needs to have a trailing 0円 assigned to it before returning, also. Consider setting it before returning or by calling memset after malloc.

Question 5

The solution by @asveikau is more efficient, but here is a small version, for what it is worth. Note that I have used the standard library strchr to look for chars within strings. This is better than rolling your own loop to do the same thing and also avoids nested loops (always a bad sign).

#include <string.h>
char *
squeeze(char *s, const char *t)
{
 char *out = s;
 for (int i=0; s[i]; ++i) {
 if (!strchr(t, s[i]))
 *out++ = s[i];
 }
 *out = '0円';
 return s;
}

Question 6

It would be better to iterate over both the strings only once. Create a third temporary variable, equal to size of first string, and copy characters one-by-one if no match is found in the second string.

char *squeeze (char *destination, char *source) {
 char * result = (char*) malloc (destination_length + 1);
 bool found = FALSE;
 int destination_length = strlen (destination);
 int source_length = strlen (source);
 int k = 0;
 for (int i = 0; i < destination_length; i++) {
 found = FALSE;
 for (int j = 0; j < source_lenth; j++) {
 if ( source [j] == destination [i] ) {
 found = TRUE;
 break;
 }
 }
 if ( FALSE == found ) {
 result [k++] = destination[i];
 }
 }
 destination [k] = '0円';
 return destination;
}

Question 7

I agree with your first sentence, however I think you've actually described more or less the same algorithm @chrismsawl is using. Specifically this "if no match is found in the second string" part - implemented naively that would be another traversal of str2 at each step. Can you share more details of how you'd do it?

asveikau asveikau 1,8029 silver badges8 bronze badges · Answer 1 · 2012-10-08 04:41:32Z

You want to avoid repeat iteration over the 2nd string, which is O(N). Do this N times and you have an O(N²) algorithm.

One trick you could use is to build a lookup table of the chars in str2. This would cost O(N) and some extra memory upfront, but then subsequent lookups in the table could be brought down to O(1).

Here's a simple example. The trade-off is this takes a bit more memory and makes some assumptions about the range of char. It also clobbers the buffer of str1:

#include <limits.h> // for UCHAR_MAX
char *squeeze (char *str1, char *str2)
{
 char present[UCHAR_MAX + 1] = {0};
 char *src, *dst;
 // Build lookup table of chars in str2
 //
 while (*str2)
 present[(unsigned char)*str2++] = 1;
 // Iterate through str1, removing chars along the way.
 //
 src = dst = str1;
 while (*src)
 {
 // Is *src in str2?
 //
 if (present[(unsigned char)*src])
 {
 // Yes, remove this char.. (Advance src but not dst.)
 //
 ++src;
 }
 else
 *dst++ = *src++;
 }
 // NUL terminator
 //
 *dst = 0;
 return str1;
}

user1201210 user1201210 1863 bronze badges · Answer 2 · 2012-10-07 19:08:09Z

Minor improvements:

chr1 and chr2 are initialized to '0円' before being assigned, but they will always be reassigned before being read. i and j are not being initialized, but, similarly, will be assigned before being read. If you're going for performance optimization, consider leaving chr1 and chr2 uninitialized. If you're going for good programming practice, consider initializing i and j.
The loops for (j = 0; j < strlen (str2); j++) call strlen each iteration even though str2 doesn't change anywhere in the function. Consider creating a str2len variable like str1len to remove the unnecessary calls.

Major improvement:

The first loop isn't necessary. It's only purpose is to determine the length of finalStr, but we know that a) squeeze never returns a string longer than str1, and b) indx contains the actual length of finalStr. Knowing this, you can malloc finalStr to the size of str1 + 1, then realloc it to indx + 1 before returning.

Edit: finalStr needs to have a trailing 0円 assigned to it before returning, also. Consider setting it before returning or by calling memset after malloc.

William Morris William Morris 9,38419 silver badges42 bronze badges · Answer 3 · 2012-10-12 14:00:42Z

The solution by @asveikau is more efficient, but here is a small version, for what it is worth. Note that I have used the standard library strchr to look for chars within strings. This is better than rolling your own loop to do the same thing and also avoids nested loops (always a bad sign).

#include <string.h>
char *
squeeze(char *s, const char *t)
{
 char *out = s;
 for (int i=0; s[i]; ++i) {
 if (!strchr(t, s[i]))
 *out++ = s[i];
 }
 *out = '0円';
 return s;
}

Apeirogon Prime Apeirogon Prime 1116 bronze badges · Answer 4 · 2012-10-07 19:07:58Z

It would be better to iterate over both the strings only once. Create a third temporary variable, equal to size of first string, and copy characters one-by-one if no match is found in the second string.

char *squeeze (char *destination, char *source) {
 char * result = (char*) malloc (destination_length + 1);
 bool found = FALSE;
 int destination_length = strlen (destination);
 int source_length = strlen (source);
 int k = 0;
 for (int i = 0; i < destination_length; i++) {
 found = FALSE;
 for (int j = 0; j < source_lenth; j++) {
 if ( source [j] == destination [i] ) {
 found = TRUE;
 break;
 }
 }
 if ( FALSE == found ) {
 result [k++] = destination[i];
 }
 }
 destination [k] = '0円';
 return destination;
}

I agree with your first sentence, however I think you've actually described more or less the same algorithm @chrismsawl is using. Specifically this "if no match is found in the second string" part - implemented naively that would be another traversal of str2 at each step. Can you share more details of how you'd do it?

Stack Exchange Network

Improving and optimizing squeeze() function

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Improving and optimizing squeeze() function

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions