My solution for Advent of Code 2020, Day 4, in C

Question 1

Here is my solution for Advent of Code 2020, Day 4, Part 1 in C. I tried using fscanf or fgets, but couldn't find a nice of doing it, so I ended up using getc. Is there a way of using fscanf or fgets that makes this code better? Also, how does the code look in general?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct keyValuePair
{
 char key[4];
 char value[20];
} KeyValuePair;
typedef struct passport
{
 KeyValuePair fields[8];
 int numberOfFields;
} Passport;
int main()
{
 FILE* f = fopen("../input.txt", "r");
 if (f == NULL)
 {
 printf("Fail opening file\n");
 return -1;
 }
 Passport passports[1000];
 int currentPassport = 0;
 int currentField = 0;
 char key[4];
 char value[20];
 int c;
 while (!feof(f))
 {
 c = getc(f);
 if (c == '\n')
 {
 ++currentPassport;
 currentField = 0;
 continue;
 }
 else
 {
 // read key
 key[0] = (char)c;
 key[1] = (char)getc(f);
 key[2] = (char)getc(f);
 key[3] = '0円'; // place null terminator at the end
 // discard ':'
 c = getc(f);
 // read value
 int index = 0;
 while (c != ' ' && c != '\n')
 {
 c = getc(f);
 if (c == EOF) { break; } // at the last line, instead of a new line we have EOF
 else if (c != ' ' && c != '\n') { value[index++] = (char)c; }
 }
 value[index] = '0円'; // place null terminator at the end
 strcpy(passports[currentPassport].fields[currentField].key, key);
 strcpy(passports[currentPassport].fields[currentField].value, value);
 ++currentField;
 ++passports[currentPassport].numberOfFields;
 }
 }
 int numberOfValidPassports = 0;
 for (int i = 0; i <= currentPassport; ++i)
 {
 if (passports[i].numberOfFields == 8) { ++numberOfValidPassports; }
 else if (passports[i].numberOfFields == 7)
 {
 int cidFound = 0;
 for (int j = 0; j < passports[i].numberOfFields; ++j)
 {
 if (strcmp(passports[i].fields[j].key, "cid") == 0)
 {
 // passport contains cid and has 7 fields, therefore invalid
 cidFound = 1;
 break;
 }
 }
 if (!cidFound) { ++numberOfValidPassports; }
 }
 }
 printf("Number of valid passports: %d", numberOfValidPassports);
 fclose(f);
 return 0;
}

Question 2

Please include the specification directly into the question. It must not be necessary to follow the link in order to understand your question.

Question 3

Split your program into multiple functions

You can improve the readability and maintainability of your program greatly by splitting it into multiple functions that each do a simple, well-defined task. For example, you can reduce main() to:

int main()
{
 FILE* f = fopen("../input.txt", "r");
 int numberOfValidPassports = countValidPassports(f);
 fclose(f);
 printf("Number of valid passports: %d", numberOfValidPassports);
};

So the only responsibility of main() is now the I/O: reading the file, and writing the result to stdout. The function countValidPassports() now has to do the same as the original main(), but it doesn't have to worry about I/O anymore. Of course, this new countValidPassport() function can itself be split into multiple functions.

Check for errors while reading from an open file

You only added a check to ensure the file was opened correctly, however there can also be errors while reading the file. It is good practice to check for those as well, and to report them properly by printing an error message (to stderr) and by exitting the program with a non-zero exit code.

If a read error would occur, your code would enter an infinite loop, since you used feof() to check for the end of the file. If there is a read error before the end of the file, feof() will not return true. In general, avoid a while(!feof(...)) loop. Just check the result from an actual read operation to check whether it succeeded:

while ((c = fgetc(f)) != EOF) {
 ...
}

The fgetc() function will return EOF when it cannot read a character, either because there was a read error or because the end of file was reached.

Avoid making too many assumptions about the input

Your code makes a lot of assumptions of what the input looks like. For example, you assume that keys are always 3 characters long, and that the values are at most 20 characters long. You also assume that the only possible keys are valid passport keys. But what if you get this line in the input?

this:isaverylongkeyandvaluepair

This would cause you to write pas the end of the value arrays. Even if there is no buffer overflow issue, consider the following input:

eyr:1 iyr:2 byr:3 byr:4 byr:5 byr:6 byr:7 byr:8

You would count this as a valid passport, but it has several issues:

Not all seven required fields are present
Some fields are repeated (is that allowed?)
The expiration year is earlier than the issue year
The issue year is earlier than the birth year

Perhaps not all things need to be validated by your program, after all the task is only to check if all the required fields are present. It also doesn't mention whether any other key besides the eight mentioned in the task are allowed. But for sure you should check whether all seven required fields are present.

Don't store data unnecessarily

While you are parsing, you are storing copies of the keys and values you found in the array passports. But nothing in that array is used in the end, you only need to print the number of valid passports. So it is just a waste of memory. The only thing you have to remember is, for the current passport you are parsing, which keys you have seen so far. Once you have finished parsing a single passport, you can immediately decide whether it is valid or not, and increment numberOfValidPassports if valid.

This also solves the potential issue you have if the input contains more than 1000 passports.

Using `fscanf()` or `fgets()`

Indeed, it is good to consider using fscanf() or fgets() when parsing a file. Your current method of parsing it character by character is somewhat inefficient. Ideally, you would write something like:

while (...) {
 char key[...];
 char value[...];
 if (fscanf(f, " %[^: \n]:%s", key, value) == 2) {
 // we read a valid key:value pair
 ...
 }
}

In the format string " %[^: \n]:%s", there first is a space which will "eat" all the leading whitespace, then we match a string that can contain any character except a colon, space or newline. Then it expects a colon, and after that anything until the next whitespace.

However, it is impossible to distinguish whitespace separating key:value pairs from empty lines separating passports this way. So instead you can use fgets() to read in whole lines. It is easy to check if a line is empty, and if so you know it separated two passports. If there is anything in it, you can use sscanf() to parse it, like so:

char line[...];
while (fgets(line, sizeof line, f)) {
 if (/* check for empty line */) {
 // Passport separator
 ...
 } else {
 // Scan this line for key:value pairs
 char key[...];
 char value[...];
 char *start = line;
 int n;
 while (sscanf(start, " %[^: \n]:%s%n", key, value, &n) == 2) {
 // we read a valid key:value pair
 ...
 start += n;
 }
 }
}

The trick here is using %n to check how many characters the key:value pair was, so we can skip it for the next call to sscanf().

However: be aware of possible buffer overflows. There are several ways to deal with them, either use field width specifiers in the format string to tell sscanf() to not read more characters than your buffer is long, or if you can use POSIX extensions, consider using %ms to have sscanf() allocate a large enough buffer for the string for you. Since you don't care about the values, you can avoid storing them by using assignment-suppression: %*s. Read the documentation for sscanf() to see all its possibilities.

Question 4

Thanks for your answer. I made some considerations here (it's too many characters to post here directly) in case you want to check out.

Question 5

I tried the sscanf approach but it's not working. Is there something wrong with this code?

Question 6

Yes sorry, %s matches a : as well, so the format string should have been %[^:]:%s. I updated the answer, and also added a space in front of it to ensure leading whitespace is removed.

Question 7

I managed to get it working like this, but I still need to remove the new line after the last value.

Question 8

Oops, should have reloaded the page before adding a comment. I will redo the code using this approach.

G. Sliepen G. Sliepen 68.7k3 gold badges74 silver badges179 bronze badges · Answer 1 · 2021-04-11 07:19:03Z

Split your program into multiple functions

You can improve the readability and maintainability of your program greatly by splitting it into multiple functions that each do a simple, well-defined task. For example, you can reduce main() to:

int main()
{
 FILE* f = fopen("../input.txt", "r");
 int numberOfValidPassports = countValidPassports(f);
 fclose(f);
 printf("Number of valid passports: %d", numberOfValidPassports);
};

So the only responsibility of main() is now the I/O: reading the file, and writing the result to stdout. The function countValidPassports() now has to do the same as the original main(), but it doesn't have to worry about I/O anymore. Of course, this new countValidPassport() function can itself be split into multiple functions.

Check for errors while reading from an open file

You only added a check to ensure the file was opened correctly, however there can also be errors while reading the file. It is good practice to check for those as well, and to report them properly by printing an error message (to stderr) and by exitting the program with a non-zero exit code.

If a read error would occur, your code would enter an infinite loop, since you used feof() to check for the end of the file. If there is a read error before the end of the file, feof() will not return true. In general, avoid a while(!feof(...)) loop. Just check the result from an actual read operation to check whether it succeeded:

while ((c = fgetc(f)) != EOF) {
 ...
}

The fgetc() function will return EOF when it cannot read a character, either because there was a read error or because the end of file was reached.

Avoid making too many assumptions about the input

Your code makes a lot of assumptions of what the input looks like. For example, you assume that keys are always 3 characters long, and that the values are at most 20 characters long. You also assume that the only possible keys are valid passport keys. But what if you get this line in the input?

this:isaverylongkeyandvaluepair

This would cause you to write pas the end of the value arrays. Even if there is no buffer overflow issue, consider the following input:

eyr:1 iyr:2 byr:3 byr:4 byr:5 byr:6 byr:7 byr:8

You would count this as a valid passport, but it has several issues:

Not all seven required fields are present
Some fields are repeated (is that allowed?)
The expiration year is earlier than the issue year
The issue year is earlier than the birth year

Perhaps not all things need to be validated by your program, after all the task is only to check if all the required fields are present. It also doesn't mention whether any other key besides the eight mentioned in the task are allowed. But for sure you should check whether all seven required fields are present.

Don't store data unnecessarily

While you are parsing, you are storing copies of the keys and values you found in the array passports. But nothing in that array is used in the end, you only need to print the number of valid passports. So it is just a waste of memory. The only thing you have to remember is, for the current passport you are parsing, which keys you have seen so far. Once you have finished parsing a single passport, you can immediately decide whether it is valid or not, and increment numberOfValidPassports if valid.

This also solves the potential issue you have if the input contains more than 1000 passports.

Using `fscanf()` or `fgets()`

Indeed, it is good to consider using fscanf() or fgets() when parsing a file. Your current method of parsing it character by character is somewhat inefficient. Ideally, you would write something like:

while (...) {
 char key[...];
 char value[...];
 if (fscanf(f, " %[^: \n]:%s", key, value) == 2) {
 // we read a valid key:value pair
 ...
 }
}

In the format string " %[^: \n]:%s", there first is a space which will "eat" all the leading whitespace, then we match a string that can contain any character except a colon, space or newline. Then it expects a colon, and after that anything until the next whitespace.

However, it is impossible to distinguish whitespace separating key:value pairs from empty lines separating passports this way. So instead you can use fgets() to read in whole lines. It is easy to check if a line is empty, and if so you know it separated two passports. If there is anything in it, you can use sscanf() to parse it, like so:

char line[...];
while (fgets(line, sizeof line, f)) {
 if (/* check for empty line */) {
 // Passport separator
 ...
 } else {
 // Scan this line for key:value pairs
 char key[...];
 char value[...];
 char *start = line;
 int n;
 while (sscanf(start, " %[^: \n]:%s%n", key, value, &n) == 2) {
 // we read a valid key:value pair
 ...
 start += n;
 }
 }
}

The trick here is using %n to check how many characters the key:value pair was, so we can skip it for the next call to sscanf().

However: be aware of possible buffer overflows. There are several ways to deal with them, either use field width specifiers in the format string to tell sscanf() to not read more characters than your buffer is long, or if you can use POSIX extensions, consider using %ms to have sscanf() allocate a large enough buffer for the string for you. Since you don't care about the values, you can avoid storing them by using assignment-suppression: %*s. Read the documentation for sscanf() to see all its possibilities.

Thanks for your answer. I made some considerations here (it's too many characters to post here directly) in case you want to check out.
I tried the sscanf approach but it's not working. Is there something wrong with this code?
Yes sorry, %s matches a : as well, so the format string should have been %[^:]:%s. I updated the answer, and also added a space in front of it to ensure leading whitespace is removed.
I managed to get it working like this, but I still need to remove the new line after the last value.
Oops, should have reloaded the page before adding a comment. I will redo the code using this approach.

Stack Exchange Network

My solution for Advent of Code 2020, Day 4, in C

1 Answer 1

Split your program into multiple functions

Check for errors while reading from an open file

Avoid making too many assumptions about the input

Don't store data unnecessarily

Using `fscanf()` or `fgets()`

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

My solution for Advent of Code 2020, Day 4, in C

1 Answer 1

Split your program into multiple functions

Check for errors while reading from an open file

Avoid making too many assumptions about the input

Don't store data unnecessarily

Using fscanf() or fgets()

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Using `fscanf()` or `fgets()`