Skip to main content
Code Review

Return to Answer

deleted 3 characters in body
Source Link
200_success
  • 145.5k
  • 22
  • 190
  • 478
/**
 * Returns the contents of the specified file as a NUL-terminated string.
 * If any error occurs (e.g. file not found, permission denied, no free memory),
 * returns NULL, and errno will be tell the reason.
 */
char *readSourceCode(const char *filename) {
 /* Hints: Use stat() or fstat() to find the file size.
 Allocate that many bytes, plus 1 for the '0円'.
 Remember to fclose() when done!
 */
 ...
}
/**
 * Returns the contents of the specified file as a NUL-terminated string.
 * If any error occurs (e.g. file not found, permission denied, no free memory),
 * returns NULL, and errno will be tell the reason.
 */
char *readSourceCode(const char *filename) {
 /* Hints: Use stat() or fstat() to find the file size.
 Allocate that many bytes, plus 1 for the '0円'.
 Remember to fclose() when done!
 */
 ...
}
/**
 * Returns the contents of the specified file as a NUL-terminated string.
 * If any error occurs (e.g. file not found, permission denied, no free memory),
 * returns NULL, and errno will tell the reason.
 */
char *readSourceCode(const char *filename) {
 /* Hints: Use stat() or fstat() to find the file size.
 Allocate that many bytes, plus 1 for the '0円'.
 Remember to fclose() when done!
 */
 ...
}
Suggested solution outline
Source Link
200_success
  • 145.5k
  • 22
  • 190
  • 478

Suggestion

I'll try to provide some guidance without giving away a complete solution to your .

The key insight is that it is possible to construct an automaton that works at a higher level than character-by-character.

I'll start with main().

/* Advice: The C standard says that main() must return an int, not void. */
int main(int argc, char *argv[]) {
 /* Advice: Don't hard-code input filename. Accept a command-line parameter instead */
 if (argc < 2) {
 fprintf(stderr, "Usage: %s FILENAME\n", argv[0]);
 return 1;
 }
 const char *filename = argv[1];
 /* Advice: Avoid global variables. Return a pointer instead. */
 char *program = readSourceCode(filename);
 if (!program) {
 perror(filename);
 return 1;
 }
 const char *cursor = program;
 while (*cursor) {
 /* Advice: What does your automaton do? Pick an informative name. */
 switch (consumeKeyword(&cursor)) {
 /* Advice: Avoid magic numbers. #define constants or use an enum. */
 case CASE: printf("CASE\n"); break;
 case DO: printf("DO\n"); break;
 case ELSE: printf("ELSE\n"); break;
 case ELSEIF: printf("ELSE IF\n"); break;
 default:
 if (isIdentifier(&cursor)) {
 printf("IDENT\n"); break;
 } else {
 printf("Invalid! %s", cursor);
 }
 free(program);
 return 1;
 }
 }
 free(program);
 return 0;
}

The first thing to get out of the way is readSourceCode(). See if you can implement this:

/**
 * Returns the contents of the specified file as a NUL-terminated string.
 * If any error occurs (e.g. file not found, permission denied, no free memory),
 * returns NULL, and errno will be tell the reason.
 */
char *readSourceCode(const char *filename) {
 /* Hints: Use stat() or fstat() to find the file size.
 Allocate that many bytes, plus 1 for the '0円'.
 Remember to fclose() when done!
 */
 ...
}

The main challenge is consumeKeyword(). As you can see, main() passes a "cursor" to consumeKeyword() — it does so by reference, so that consumeKeyword() can increment main's cursor.

enum keyword {
 NOT_A_KEYWORD,
 ...
};
/**
 * Skips over any whitespace characters at the cursor, then looks for any
 * recognizable keywords. If a keyword is found, advances the cursor to
 * just beyond the end of the keyword. If no keyword is found, returns
 * NOT_A_KEYWORD and leaves the cursor in place.
 */
enum keyword consumeKeyword(const char **cursor) {
 while (isspace(**cursor)) {
 (*cursor)++;
 }
 size_t advance;
 if ((advance = isKeyword(cursor, "case"))) {
 *cursor += advance;
 return CASE;
 } else if ... {
 ...
 } else {
 return NOT_A_KEYWORD;
 }
}

You should be able to fill in the rest:

/**
 * If the specified keyword is found at the specified cursor position,
 * returns the length of the keyword. Otherwise, returns 0.
 */
size_t isKeyword(const char *const *cursor, const char *keyword) {
 /* Hint: With a for-loop, the entire function should take no more
 than four lines. */
}

Suggestion

I'll try to provide some guidance without giving away a complete solution to your .

The key insight is that it is possible to construct an automaton that works at a higher level than character-by-character.

I'll start with main().

/* Advice: The C standard says that main() must return an int, not void. */
int main(int argc, char *argv[]) {
 /* Advice: Don't hard-code input filename. Accept a command-line parameter instead */
 if (argc < 2) {
 fprintf(stderr, "Usage: %s FILENAME\n", argv[0]);
 return 1;
 }
 const char *filename = argv[1];
 /* Advice: Avoid global variables. Return a pointer instead. */
 char *program = readSourceCode(filename);
 if (!program) {
 perror(filename);
 return 1;
 }
 const char *cursor = program;
 while (*cursor) {
 /* Advice: What does your automaton do? Pick an informative name. */
 switch (consumeKeyword(&cursor)) {
 /* Advice: Avoid magic numbers. #define constants or use an enum. */
 case CASE: printf("CASE\n"); break;
 case DO: printf("DO\n"); break;
 case ELSE: printf("ELSE\n"); break;
 case ELSEIF: printf("ELSE IF\n"); break;
 default:
 if (isIdentifier(&cursor)) {
 printf("IDENT\n"); break;
 } else {
 printf("Invalid! %s", cursor);
 }
 free(program);
 return 1;
 }
 }
 free(program);
 return 0;
}

The first thing to get out of the way is readSourceCode(). See if you can implement this:

/**
 * Returns the contents of the specified file as a NUL-terminated string.
 * If any error occurs (e.g. file not found, permission denied, no free memory),
 * returns NULL, and errno will be tell the reason.
 */
char *readSourceCode(const char *filename) {
 /* Hints: Use stat() or fstat() to find the file size.
 Allocate that many bytes, plus 1 for the '0円'.
 Remember to fclose() when done!
 */
 ...
}

The main challenge is consumeKeyword(). As you can see, main() passes a "cursor" to consumeKeyword() — it does so by reference, so that consumeKeyword() can increment main's cursor.

enum keyword {
 NOT_A_KEYWORD,
 ...
};
/**
 * Skips over any whitespace characters at the cursor, then looks for any
 * recognizable keywords. If a keyword is found, advances the cursor to
 * just beyond the end of the keyword. If no keyword is found, returns
 * NOT_A_KEYWORD and leaves the cursor in place.
 */
enum keyword consumeKeyword(const char **cursor) {
 while (isspace(**cursor)) {
 (*cursor)++;
 }
 size_t advance;
 if ((advance = isKeyword(cursor, "case"))) {
 *cursor += advance;
 return CASE;
 } else if ... {
 ...
 } else {
 return NOT_A_KEYWORD;
 }
}

You should be able to fill in the rest:

/**
 * If the specified keyword is found at the specified cursor position,
 * returns the length of the keyword. Otherwise, returns 0.
 */
size_t isKeyword(const char *const *cursor, const char *keyword) {
 /* Hint: With a for-loop, the entire function should take no more
 than four lines. */
}
Source Link
200_success
  • 145.5k
  • 22
  • 190
  • 478

I think you're overcomplicating the problem by working a character at a time.

  • Reading one character at a time using getc() is inefficient. You could fetch a bigger chunk at a time using fread().
  • Wrapping each character into a linked list node is wasteful. While each character normally takes one byte when stored in a large array or a string, your linked list node is likely to take more than eight bytes. On a 32-bit system, a struct node will probably use one byte for the character itself, four bytes for a pointer, and three wasted bytes in between to allow the pointer to be aligned at an address that is a multiple of four. On a 64-bit system, it would be even worse. Also, malloc() incurs some memory overhead for its bookkeeping (so that it remembers the size of the chunk of memory that will be liberated when free() is eventually called).
  • Analyzing the text one character at a time is tedious. That's some of the deepest indentation I've seen in any code. If you keep the data as a string, you could use some string handling routines from the standard C library, such as strsep().
lang-c

AltStyle によって変換されたページ (->オリジナル) /