3
\$\begingroup\$

I implemented basic string compression algorithm that uses the counts of repeated characters. For example: the string aabcccccaaa would become a2b1c5a3. What do you think about this, is there a better way to do this?

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
char* compressString(char* arr, int size);
int main() {
 char arr[] = "aabcccccaa";
 char *str;
 printf("Before compression: %s\n", arr);
 str = compressString(arr, strlen(arr));
 printf("After compression: %s\n", str);
 // free allocated memory
 free(str);
 return 0;
}
char* compressString(char* str, int len) {
 char last = str[0];
 char *buf = (char *)malloc(len * sizeof(char));
 int count = 1;
 int j = 0;
 for (int i = 1; i < len; i++) {
 if (last == str[i]) {
 count++;
 } else {
 buf[j++] = last;
 buf[j++] += count + '0';
 last = str[i];
 count = 1;
 }
 }
 buf[j++] = last;
 buf[j] += count + '0';
 buf[j] = '0円';
 return buf;
}
Quill
12k5 gold badges41 silver badges93 bronze badges
asked Jan 8, 2016 at 4:40
\$\endgroup\$
3
  • 1
    \$\begingroup\$ Thoughts: const correctness, comment on encoding of count (why single character/digit?), look into PackBits to see how to avoid encoding a length "for every source character change". \$\endgroup\$ Commented Jan 8, 2016 at 5:38
  • 2
    \$\begingroup\$ Please don't update your code with changes after you've received answers, see What to do after receiving answers for more information. \$\endgroup\$ Commented Jan 8, 2016 at 6:18
  • 1
    \$\begingroup\$ Seeing your edit to revision 5: it is detrimental to have encode run in more than one place. \$\endgroup\$ Commented Jan 8, 2016 at 6:21

2 Answers 2

5
\$\begingroup\$

A couple problems

  • You don't allocate enough space for the return buffer. You need to allocate 2*len + 1 bytes to handle the worst cast scenario. The +1 is for the null terminating byte.

  • If the count goes above 9, you will output a non-digit character instead of a digit. If the count goes above 256, the digit will wrap around back to '1' and your compression will have failed to encode the original string.

answered Jan 8, 2016 at 5:50
\$\endgroup\$
2
  • 4
    \$\begingroup\$ @CodeCrack: perhaps you should get a rough overview of existing compression algorithms before reinventing the wheel in a particularly non-circular shape. For one thing this would show you all the ways that have been invented for storing counts efficiently (variable-length encodings based on bytes, nibbles or bits, Huffman-encoding of counts, arithmetic coding of counts, and so on). Run-length encodings like yours are often found in bitmaps (unsurprisingly called RLE bitmaps); looking at those might give you interesting ideas. Most of your questions have already been answered ages ago... \$\endgroup\$ Commented Jan 8, 2016 at 7:01
  • \$\begingroup\$ I don't know what will be put in buf following a character from the input - buf[j++] += count + '0'; will add to whatever has been there before. I do have an inkling the last count will be overwritten, anyway. \$\endgroup\$ Commented Dec 22, 2020 at 15:19
1
\$\begingroup\$

Failing to document a non-NULL return value will need to be free()d is an even bigger foul than not checking the value from malloc() - actually freeing in spite of an immediately following return from main() is proper.
The string literal in main differs from your example.
The code presented does not work as specified: the string returned lacks the last count.
The handling of len 0 is insufficient.
As you are aware, "adding the digit to buf[j]" is wrong.

community wiki

\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.