Problem description from the Advent of Code website:
Part 1:
The task involves analyzing a calibration document containing lines of text. Each line represents a calibration value that needs to be recovered by extracting the first and last digits and combining them into a two-digit number. The goal is to find the sum of all these calibration values.
For example:
1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet
In this example, the calibration values of these four lines are 12, 38, 15, and 77. Adding these together produces 142.
Part 2:
In problem two, the calibration document may include digits spelled out with letters (e.g., "one," "two," etc.). The task is to identify the actual first and last digits in each line, taking into account both numerical digits and spelled-out numbers. The overall objective remains the same: find the sum of the calibration values obtained from combining the first and last digits of each line.
For example:
two1nine
eightwothree
abcone2threexyz
xtwone3four
4nineeightseven2
zoneight234
7pqrstsixteen
In this example, the calibration values are 29, 83, 13, 24, 42, 14, and 76. Adding these together produces 281.
Here's the code I wrote in C:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void getDigits(char* src, int* dest);
int main(int argc, char** argv) {
FILE* input = fopen(argv[1], "r");
if (input == NULL)
return -1;
char* line = NULL;
size_t line_size = 0;
int digits[2], line_len, sum = 0;
while ((line_len = getline(&line, &line_size, input)) != -1) {
getDigits(line, digits);
sum += 10*digits[0] + digits[1];
}
printf("Sum: %d\n", sum);
return 0;
}
void getDigits(char* src, int* dest) {
int len = strlen(src);
// Get first digit
for (int i = 0; i < len; i++) {
if (!strncmp("zero", src+i, 4)) {
dest[0] = 0;
break;
}
if (!strncmp("one", src+i, 3)) {
dest[0] = 1;
break;
}
if (!strncmp("two", src+i, 3)) {
dest[0] = 2;
break;
}
if (!strncmp("three", src+i, 5)) {
dest[0] = 3;
break;
}
if (!strncmp("four", src+i, 4)) {
dest[0] = 4;
break;
}
if (!strncmp("five", src+i, 4)) {
dest[0] = 5;
break;
}
if (!strncmp("six", src+i, 3)) {
dest[0] = 6;
break;
}
if (!strncmp("seven", src+i, 5)) {
dest[0] = 7;
break;
}
if (!strncmp("eight", src+i, 5)) {
dest[0] = 8;
break;
}
if (!strncmp("nine", src+i, 4)) {
dest[0] = 9;
break;
}
if (src[i] > 47 && src[i] < 58) { // 48 -> 57 ASCII => number
dest[0] = src[i] - 48;
break;
}
}
// Get second digit
for (int i = len-1; i >= 0; i--) {
if (!strncmp("zero", src+i-3, 4)) {
dest[1] = 0;
break;
}
if (!strncmp("one", src+i-2, 3)) {
dest[1] = 1;
break;
}
if (!strncmp("two", src+i-2, 3)) {
dest[1] = 2;
break;
}
if (!strncmp("three", src+i-4, 5)) {
dest[1] = 3;
break;
}
if (!strncmp("four", src+i-3, 4)) {
dest[1] = 4;
break;
}
if (!strncmp("five", src+i-3, 4)) {
dest[1] = 5;
break;
}
if (!strncmp("six", src+i-2, 3)) {
dest[1] = 6;
break;
}
if (!strncmp("seven", src+i-4, 5)) {
dest[1] = 7;
break;
}
if (!strncmp("eight", src+i-4, 5)) {
dest[1] = 8;
break;
}
if (!strncmp("nine", src+i-3, 4)) {
dest[1] = 9;
break;
}
if (src[i] > 47 && src[i] < 58) {
dest[1] = src[i] - 48;
break;
}
}
}
Is there any way to optimize the "spelled out numbers" checking?
2 Answers 2
It's normal to write main()
as the last function, so there's no need to forward-declare getDigts()
. We would normally make that function static
unless there's a good reason to share it with other translation units.
getline()
isn't a portable (standard C) function. Assuming that you're using the function specified by POSIX, then the memory it allocates ought to be released using free()
. Not doing that causes Valgrind warnings - it's easier to fix real problems if we clean up properly.
Risky conversion here:
int len = strlen(src);
strlen()
returns a size_t
, not int
.
There's an assumption about character coding here:
if (src[i] > 47 && src[i] < 58) { // 48 -> 57 ASCII => number dest[0] = src[i] - 48; break; }
We don't need to assume we're running in an ASCII environment, because we can use isdigit()
(from <ctype.h>
) and the abstract character '0'
instead of a numeric code-point.
You correctly identified the repetition in getDigits
as a problem. Other problems in this repetitious section include having to write out the length of each string in strncmp
and matching each string with a digit value.
These problems can all be solved if we create an array of the digit names:
const char *const names[10] =
{ "zero", "one", "two", "three", "four",
"five", "six", "seven", "eight", "nine" };
We can then use a loop over this array to replace the repeated code, using strlen()
to decide how many characters to compare, and the loop index for the numeric value. Like this:
static int find_first_digit(const char *src)
{
const size_t len = strlen(src);
for (size_t i = 0; i < len; ++i) {
if (isdigit((unsigned char)src[i])) {
return src[i] - '0';
}
for (int d = 0; d < 10; ++d) {
if (!strncmp(names[d], src+i, strlen(names[d]))) {
return d;
}
}
}
return -1; // or perhaps 0?
}
I'll leave the corresponding find_last_digit()
as an exercise.
-
\$\begingroup\$ "getline() isn't a portable (standard C) function." The POSIX man pages mostly corresponded to what the C standard specifies most times I referred to it, so I hadn't questioned it. Is there an equivalent C standard function? \$\endgroup\$virtualcode– virtualcode2024年02月04日 22:23:54 +00:00Commented Feb 4, 2024 at 22:23
-
\$\begingroup\$ Otherwise, thank you. They really don't teach coding practice at all in university. \$\endgroup\$virtualcode– virtualcode2024年02月04日 22:25:20 +00:00Commented Feb 4, 2024 at 22:25
-
2\$\begingroup\$ @virtualcode There are too many existing implementations of
getline()
available online. If you so desire, you can write your own. Or stick withfgets()
. \$\endgroup\$Madagascar– Madagascar2024年02月05日 03:07:19 +00:00Commented Feb 5, 2024 at 3:07 -
1\$\begingroup\$ @virtualcode, if you're unsure which functions are specified by which standard, you may find your man pages have a "Conformance" section to help. Failing that, check in your C reference book or web site. BTW, platform-specific code isn't necessarily always a bad thing, but I believe it's important to be aware whether your code is portable. \$\endgroup\$Toby Speight– Toby Speight2024年02月05日 07:36:57 +00:00Commented Feb 5, 2024 at 7:36
-
\$\begingroup\$ String
ze
. Wouldn'tstrncmp
try to compare with "zero", butsrc + i + strlen(names[d])
is out of bound? \$\endgroup\$Gauthier– Gauthier2024年02月05日 10:12:09 +00:00Commented Feb 5, 2024 at 10:12
Do not trust user input:
If there were no arguments supplied, argv[1]
would be NULL
, and the call to fopen()
would invoke undefined behavior.
int main(int argc, char** argv) {
#if 0
FILE* input = fopen(argv[1], "r");
#else
if (argc != 2) {
/* Handle error. */
...
}
#endif
Use standard error codes:
stdlib.h
defines EXIT_SUCCESS
and EXIT_FAILURE
. Consider using them to replace non-portable magic values like 1, -1, et cetera.
Do not mismatch integer types:
getline()
returns a ssize_t
, not int
.
strlen()
returns a size_t
, not int
.
Algorithm:
After the duplication @Toby has pointed out has been eliminated, you could consider:
Checking if the next character starts a digit name before making all the calls to
strncmp()
/memcmp()
. This saves us 10 function calls for each non-numeric character./* Something like this would suffice: */ if (strchr("zotfsen", src[i]) { ... }
Storing both the length and the name to avoid calling
strlen()
. This saves us all the calls tostrlen()
.#define SIZEOF_STR(x) (sizeof(x) - 1) struct digit_map { const char *const dname; const unsigned int dlen; } digits[] = { { "zero", SIZEOF_STR("zero") }, { "one", SIZEOF_STR("one") }, { "two", SIZEOF_STR("two") }, { "three", SIZEOF_STR("three") }, { "four", SIZEOF_STR("four") }, { "five", SIZEOF_STR("five") }, { "six", SIZEOF_STR("six") }, { "seven", SIZEOF_STR("seven") }, { "eight", SIZEOF_STR("eight") }, { "nine", SIZEOF_STR("nine") } }; #undef SIZEOF_STR
Or you can use a separate parallel array and populate it at the start of main()
.
The call to strlen()
for the user input can also be eliminated by using the return value of getline()
. According to the man page:
On success, getline() and getdelim() return the number of characters read, including the delimiter character, but not including the terminating null byte ('0円').
See an alternative implementation: Advent of Code 2023 day 1: Trebuchet (Part 1 and 2) Follow-up.
-
2\$\begingroup\$ Rather than hand-coding the string lengths, I'd probably compute them. Most likely into a separate parallel array that we can populate at the start of
main()
. Less potential for human error that way! \$\endgroup\$Toby Speight– Toby Speight2024年02月05日 07:39:35 +00:00Commented Feb 5, 2024 at 7:39 -
\$\begingroup\$ @TobySpeight That's correct. I have edited the answer. \$\endgroup\$Madagascar– Madagascar2024年02月07日 07:25:13 +00:00Commented Feb 7, 2024 at 7:25