6
\$\begingroup\$

Problem description from the Advent of Code website:

Part 1:

The task involves analyzing a calibration document containing lines of text. Each line represents a calibration value that needs to be recovered by extracting the first and last digits and combining them into a two-digit number. The goal is to find the sum of all these calibration values.

For example:

1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet

In this example, the calibration values of these four lines are 12, 38, 15, and 77. Adding these together produces 142.

Part 2:

In problem two, the calibration document may include digits spelled out with letters (e.g., "one," "two," etc.). The task is to identify the actual first and last digits in each line, taking into account both numerical digits and spelled-out numbers. The overall objective remains the same: find the sum of the calibration values obtained from combining the first and last digits of each line.

For example:

two1nine
eightwothree
abcone2threexyz
xtwone3four
4nineeightseven2
zoneight234
7pqrstsixteen

In this example, the calibration values are 29, 83, 13, 24, 42, 14, and 76. Adding these together produces 281.


Here's the code I wrote in C:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void getDigits(char* src, int* dest);
int main(int argc, char** argv) {
 FILE* input = fopen(argv[1], "r");
 
 if (input == NULL)
 return -1;
 char* line = NULL;
 size_t line_size = 0;
 int digits[2], line_len, sum = 0;
 while ((line_len = getline(&line, &line_size, input)) != -1) {
 getDigits(line, digits);
 sum += 10*digits[0] + digits[1];
 }
 printf("Sum: %d\n", sum);
 return 0;
}
void getDigits(char* src, int* dest) {
 int len = strlen(src);
 // Get first digit
 for (int i = 0; i < len; i++) {
 if (!strncmp("zero", src+i, 4)) {
 dest[0] = 0;
 break;
 }
 if (!strncmp("one", src+i, 3)) {
 dest[0] = 1;
 break;
 }
 if (!strncmp("two", src+i, 3)) {
 dest[0] = 2;
 break;
 }
 if (!strncmp("three", src+i, 5)) {
 dest[0] = 3;
 break;
 }
 if (!strncmp("four", src+i, 4)) {
 dest[0] = 4;
 break;
 }
 if (!strncmp("five", src+i, 4)) {
 dest[0] = 5;
 break;
 }
 if (!strncmp("six", src+i, 3)) {
 dest[0] = 6;
 break;
 }
 if (!strncmp("seven", src+i, 5)) {
 dest[0] = 7;
 break;
 }
 if (!strncmp("eight", src+i, 5)) {
 dest[0] = 8;
 break;
 }
 if (!strncmp("nine", src+i, 4)) {
 dest[0] = 9;
 break;
 }
 if (src[i] > 47 && src[i] < 58) { // 48 -> 57 ASCII => number
 dest[0] = src[i] - 48;
 break;
 }
 }
 // Get second digit
 for (int i = len-1; i >= 0; i--) {
 if (!strncmp("zero", src+i-3, 4)) {
 dest[1] = 0;
 break;
 }
 if (!strncmp("one", src+i-2, 3)) {
 dest[1] = 1;
 break;
 }
 if (!strncmp("two", src+i-2, 3)) {
 dest[1] = 2;
 break;
 }
 if (!strncmp("three", src+i-4, 5)) {
 dest[1] = 3;
 break;
 }
 if (!strncmp("four", src+i-3, 4)) {
 dest[1] = 4;
 break;
 }
 if (!strncmp("five", src+i-3, 4)) {
 dest[1] = 5;
 break;
 }
 if (!strncmp("six", src+i-2, 3)) {
 dest[1] = 6;
 break;
 }
 if (!strncmp("seven", src+i-4, 5)) {
 dest[1] = 7;
 break;
 }
 if (!strncmp("eight", src+i-4, 5)) {
 dest[1] = 8;
 break;
 }
 if (!strncmp("nine", src+i-3, 4)) {
 dest[1] = 9;
 break;
 }
 if (src[i] > 47 && src[i] < 58) {
 dest[1] = src[i] - 48;
 break;
 }
 }
}

Is there any way to optimize the "spelled out numbers" checking?

asked Feb 4, 2024 at 19:34
\$\endgroup\$

2 Answers 2

6
\$\begingroup\$

It's normal to write main() as the last function, so there's no need to forward-declare getDigts(). We would normally make that function static unless there's a good reason to share it with other translation units.


getline() isn't a portable (standard C) function. Assuming that you're using the function specified by POSIX, then the memory it allocates ought to be released using free(). Not doing that causes Valgrind warnings - it's easier to fix real problems if we clean up properly.


Risky conversion here:

int len = strlen(src);

strlen() returns a size_t, not int.


There's an assumption about character coding here:

 if (src[i] > 47 && src[i] < 58) { // 48 -> 57 ASCII => number
 dest[0] = src[i] - 48;
 break;
 }

We don't need to assume we're running in an ASCII environment, because we can use isdigit() (from <ctype.h>) and the abstract character '0' instead of a numeric code-point.


You correctly identified the repetition in getDigits as a problem. Other problems in this repetitious section include having to write out the length of each string in strncmp and matching each string with a digit value.

These problems can all be solved if we create an array of the digit names:

const char *const names[10] =
 { "zero", "one", "two", "three", "four",
 "five", "six", "seven", "eight", "nine" };

We can then use a loop over this array to replace the repeated code, using strlen() to decide how many characters to compare, and the loop index for the numeric value. Like this:

static int find_first_digit(const char *src)
{
 const size_t len = strlen(src);
 for (size_t i = 0; i < len; ++i) {
 if (isdigit((unsigned char)src[i])) {
 return src[i] - '0';
 }
 for (int d = 0; d < 10; ++d) {
 if (!strncmp(names[d], src+i, strlen(names[d]))) {
 return d;
 }
 }
 }
 return -1; // or perhaps 0?
}

I'll leave the corresponding find_last_digit() as an exercise.

answered Feb 4, 2024 at 21:41
\$\endgroup\$
6
  • \$\begingroup\$ "getline() isn't a portable (standard C) function." The POSIX man pages mostly corresponded to what the C standard specifies most times I referred to it, so I hadn't questioned it. Is there an equivalent C standard function? \$\endgroup\$ Commented Feb 4, 2024 at 22:23
  • \$\begingroup\$ Otherwise, thank you. They really don't teach coding practice at all in university. \$\endgroup\$ Commented Feb 4, 2024 at 22:25
  • 2
    \$\begingroup\$ @virtualcode There are too many existing implementations of getline() available online. If you so desire, you can write your own. Or stick with fgets(). \$\endgroup\$ Commented Feb 5, 2024 at 3:07
  • 1
    \$\begingroup\$ @virtualcode, if you're unsure which functions are specified by which standard, you may find your man pages have a "Conformance" section to help. Failing that, check in your C reference book or web site. BTW, platform-specific code isn't necessarily always a bad thing, but I believe it's important to be aware whether your code is portable. \$\endgroup\$ Commented Feb 5, 2024 at 7:36
  • \$\begingroup\$ String ze. Wouldn't strncmp try to compare with "zero", but src + i + strlen(names[d]) is out of bound? \$\endgroup\$ Commented Feb 5, 2024 at 10:12
5
\$\begingroup\$

Do not trust user input:

If there were no arguments supplied, argv[1] would be NULL, and the call to fopen() would invoke undefined behavior.

int main(int argc, char** argv) {
#if 0
 FILE* input = fopen(argv[1], "r");
#else
 if (argc != 2) {
 /* Handle error. */
 ...
 }
#endif

Use standard error codes:

stdlib.h defines EXIT_SUCCESS and EXIT_FAILURE. Consider using them to replace non-portable magic values like 1, -1, et cetera.

Do not mismatch integer types:

getline() returns a ssize_t, not int. strlen() returns a size_t, not int.

Algorithm:

After the duplication @Toby has pointed out has been eliminated, you could consider:

  1. Checking if the next character starts a digit name before making all the calls to strncmp()/memcmp(). This saves us 10 function calls for each non-numeric character.

    /* Something like this would suffice: */
    if (strchr("zotfsen", src[i]) {
     ...
    }
    
  2. Storing both the length and the name to avoid calling strlen(). This saves us all the calls to strlen().

    #define SIZEOF_STR(x) (sizeof(x) - 1)
    struct digit_map {
     const char *const dname;
     const unsigned int dlen;
    } digits[] = {
     { "zero", SIZEOF_STR("zero") },
     { "one", SIZEOF_STR("one") }, 
     { "two", SIZEOF_STR("two") },
     { "three", SIZEOF_STR("three") },
     { "four", SIZEOF_STR("four") },
     { "five", SIZEOF_STR("five") },
     { "six", SIZEOF_STR("six") },
     { "seven", SIZEOF_STR("seven") },
     { "eight", SIZEOF_STR("eight") },
     { "nine", SIZEOF_STR("nine") }
    };
    #undef SIZEOF_STR
    

Or you can use a separate parallel array and populate it at the start of main().

The call to strlen() for the user input can also be eliminated by using the return value of getline(). According to the man page:

On success, getline() and getdelim() return the number of characters read, including the delimiter character, but not including the terminating null byte ('0円').

See an alternative implementation: Advent of Code 2023 day 1: Trebuchet (Part 1 and 2) Follow-up.

answered Feb 5, 2024 at 2:57
\$\endgroup\$
2
  • 2
    \$\begingroup\$ Rather than hand-coding the string lengths, I'd probably compute them. Most likely into a separate parallel array that we can populate at the start of main(). Less potential for human error that way! \$\endgroup\$ Commented Feb 5, 2024 at 7:39
  • \$\begingroup\$ @TobySpeight That's correct. I have edited the answer. \$\endgroup\$ Commented Feb 7, 2024 at 7:25

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.