In creating this answer to a recent question, I thought I might write a fully worked example that implements a finite state machine in C.
The purpose is to implement the following function:
Given a
const char *
, search for a string conforming to the pattern WWWDD where WWW is one of { "Mon", "Tue", "Wed, "Thu", "Fri", "Sat", "Sun" } and DD is an integer in the range \$[1, 31]\$ (inclusive). If the pattern exists within the string, return the offset; otherwise return -1.
I implemented this state machine:finite state machine using simple table structure. This is the code, which includes a simple driver with test vectors.
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
static const struct fsmNode {
const char target;
const struct fsmNode *next;
} fsm[] = {
{ 'T' , &fsm[6] }, // 0
{ 'M' , &fsm[9] }, // 1
{ 'S' , &fsm[11] }, // 2
{ 'W' , &fsm[12] }, // 3
{ 'F' , &fsm[14] }, // 4
{ '0円' , NULL }, // 5
// second letter
{ 'h' , &fsm[18] }, // 6
{ 'u' , &fsm[20] }, // 7
{ '0円' , NULL }, // 8
{ 'o' , &fsm[22] }, // 9
{ '0円' , NULL }, // 10
{ 'a' , &fsm[24] }, // 11
{ 'u' , &fsm[22] }, // 12
{ '0円' , NULL }, // 13
{ 'e' , &fsm[26] }, // 14
{ '0円' , NULL }, // 15
{ 'r' , &fsm[27] }, // 16
{ '0円' , NULL }, // 17
// third letter
{ 'u' , &fsm[30] }, // 18
{ '0円' , NULL }, // 19
{ 'e' , &fsm[30] }, // 20
{ '0円' , NULL }, // 21
{ 'n' , &fsm[30] }, // 22
{ '0円' , NULL }, // 23
{ 't' , &fsm[30] }, // 24
{ '0円' , NULL }, // 25
{ 'd' , &fsm[30] }, // 26
{ '0円' , NULL }, // 27
{ 'i' , &fsm[30] }, // 28
{ '0円' , NULL }, // 29
// first digit
{ '0' , &fsm[36] }, // 30
{ '1' , &fsm[35] }, // 31
{ '2' , &fsm[35] }, // 32
{ '3' , &fsm[46] }, // 33
{ '0円' , NULL }, // 34
// second digit
{ '0' , NULL }, // 35
{ '1' , NULL }, // 36
{ '2' , NULL }, // 37
{ '3' , NULL }, // 38
{ '4' , NULL }, // 39
{ '5' , NULL }, // 40
{ '6' , NULL }, // 41
{ '7' , NULL }, // 42
{ '8' , NULL }, // 43
{ '9' , NULL }, // 44
{ '0円' , NULL }, // 45
{ '0' , NULL }, // 46
{ '1' , NULL }, // 47
{ '0円' , NULL }, // 48
};
int findWeekdayDate(const char *str) {
const struct fsmNode *state = fsm;
const char *curr;
for (curr = str ; *curr; ++curr) {
for ( ; state->target && *curr != state->target; ++state)
{ /* do nothing */ }
if (state->next == NULL) {
if (state->target) {
// found a valid string
return curr-str-4;
} else {
state = fsm;
}
} else {
state = state->next;
}
}
return -1;
}
// Test code follows...
static const struct test_s {
const char *string;
int expected;
} test[] = {
{ "abcSun24Def", 3},
{ "abcMun24Def", -1},
{ "abcSun091", 3},
{ "MON31", -1},
{ "Mon31", 0},
{ NULL, -1 } // last item
};
#include <stdio.h>
#define RED "033円[31m"
#define GREEN "033円[32m"
#define WHITE "033円[39m"
int main() {
for (size_t i=0; test[i].string != NULL; ++i) {
int val = findWeekdayDate(test[i].string);
printf("%s: \"%s\", expected %d, got %d\n",
(val == test[i].expected ? GREEN " OK" WHITE: RED "BAD" WHITE),
test[i].string, test[i].expected, val
);
}
}
Comments on efficiency, style, performance, data structure choice and implementation are all of interest.
3 Answers 3
Nice work.
On
findWeekdayDate()
return, I'd expect a value that is enumerated likefWD_SUCCESS, fWD_FAIL, fWD_SUCCESS_for_foo_reason, fWD_FAIL_for_bar_reason
, etc. Rather than magic numbers-1,0,3,...
. It could end up being a simplebool
.As
fsm[]
isconst struct fsmNode
, I see no value in making a fieldconst
likeconst char target
.char target
is simpler and sufficient. OTOH, if there is some value, thennext
should also beconst
as inconst struct fsmNode * const next
.Delay/remove
#include
s.#include <string.h> // Needed? #include <stdlib.h> // OK, for NULL #include <ctype.h> // Needed?
Style inconsistency. Recommend empty
for
body to match other indentations.// for ( ; state->target && *curr != state->target; ++state) //{ /* do nothing */ } for ( ; state->target && *curr != state->target; ++state) { /* do nothing */ }
Use an auto formatter. The extra space betrays manual formatting as it is inconsistent with the rest of code. Life is too short for manual formatting.
// v for (curr = str ; *curr; ++curr) {
Suggest adding link to software used to create FSM.
-
\$\begingroup\$ Thanks for the review. On the first point, the return value can't be just a bool because it represents the offset within the string that the pattern was found. That's why
-1
is useful as a "not found" indicator. \$\endgroup\$Edward– Edward2016年12月22日 23:28:07 +00:00Commented Dec 22, 2016 at 23:28 -
\$\begingroup\$ I'm confused by your second point. The definition of the
struct
has both elements declared asconst
, so I'm not sure what you mean when you write "next
should also beconst
" -- it seems to me that it already is. Can you clarify? \$\endgroup\$Edward– Edward2016年12月22日 23:31:15 +00:00Commented Dec 22, 2016 at 23:31 -
\$\begingroup\$ @Edward The field
next
asconst struct fsmNode *next
is a pointer toconst struct fsmNode
, but is not itselfconst
. Comparestruct fsmNode * const next
andconst struct fsmNode * const next
. \$\endgroup\$chux– chux2016年12月22日 23:38:46 +00:00Commented Dec 22, 2016 at 23:38 -
\$\begingroup\$ Let me be more clear: what would adding an additional
const
keyword prevent that is currently possible but undesirable? \$\endgroup\$Edward– Edward2016年12月23日 01:55:18 +00:00Commented Dec 23, 2016 at 1:55 -
1\$\begingroup\$ Oh, I see now. You're advocating removing
const
(in front oftarget
) rather than adding one. \$\endgroup\$Edward– Edward2016年12月23日 02:11:21 +00:00Commented Dec 23, 2016 at 2:11
This was quite fun to read and to figure out how it works.
It took me a while to understand the control flow, which is my main point of critisism. I rearranged it a bit so that, at least in my opinion, the logic is easier to follow.
for ( ; state->target && *curr != state->target; ++state) { /* do nothing */ }
Here, instead of having to use a comment to "excuse" the empty body of the for
loop, I would use a while
loop and add a real comment explaining its purpose.
if (state->next == NULL) { if (state->target) { // found a valid string return curr-str-4; } else { state = fsm; } } else { state = state->next; }
For this part, I would apply the following changes:
- Consistently use implicit boolean evaluation (i.e. use
!p
instead ofp == NULL
). - Use
continue
, in order to- reduce nesting,
- group each condition with its corresponding action,
- sort the cases by increasing success (no match → partial match → full match).
- Add comments explaining each case.
This is what comes out:
int findWeekdayDate(const char *str) {
const struct fsmNode *state = fsm;
const char *curr;
for (curr = str; *curr; ++curr) {
// try to find a match
while (state->target && *curr != state->target) {
++state;
}
if (!state->target) {
// no match, retry with next character
state = fsm;
continue;
}
if (state->next) {
// partial match, proceed with next character
state = state->next;
continue;
}
// full match
return curr - str - 4;
}
return -1;
}
I hope this is still correct, at least all the provided test cases are passed.
Other remarks are:
I'm afraid this is quite an inflexible solution. Adding, removing or changing the patterns would probably require updating most of the
next
indexes (and the "line number" comments).Assuming it was somewhat convenient to generate the
fsm
structure, it would make sense to make it a parameter of the matching function so that the function can be reused for different FSMs.The magic number 4 (in
curr-str-4
) should be a named constant.- Its value should actually by 5 (because it is the length of the WWWDD pattern),
- the result of the function should then be calculated as
(curr - PATTERN_LENGTH + 1) - str
in order to (1) make the awareness of the "fence post problem" apparent and (2) highlight the fact that the offset intostr
is calculated. - Hard-coding it into the function makes it also impossible to support patterns where not all matches have the same length, e.g. the full weekday names (which would already be supported by the given FSM data structure).
-
\$\begingroup\$ Perhaps
if (state->next) { state = state->next; continue; } return curr - str - 4;
-->if (!state->next) return curr - str - 4; state = state->next;
? \$\endgroup\$chux– chux2016年12月23日 01:34:54 +00:00Commented Dec 23, 2016 at 1:34 -
\$\begingroup\$ Then the cases are not sorted by increasing success anymore, which was one of my arguments. \$\endgroup\$mkrieger1– mkrieger12016年12月23日 01:36:04 +00:00Commented Dec 23, 2016 at 1:36
-
\$\begingroup\$ Yes, the sort idea has merit - UV.
// no match, go back to start
threw me off a bit as the idea of going back to the beginning of the loop makes sense, but acontinue
is slightly incongruent with "go back to start". As it does more than go back to the loop beginning, it also does a++curr
, the iterative step between loops. More like a "next" than "go back". \$\endgroup\$chux– chux2016年12月23日 01:45:47 +00:00Commented Dec 23, 2016 at 1:45 -
\$\begingroup\$ It means "continue with the next character, at the start of the pattern". I named it after the "start" node in the FSM drawing. \$\endgroup\$mkrieger1– mkrieger12016年12月23日 02:19:40 +00:00Commented Dec 23, 2016 at 2:19
This is a great example of a state machine, thanks for going to the extra effort of coding it up. I'm going to quote your own review of the previous implementation:
Check for a null pointer
Things do not go well if the routine is passed a NULL pointer. On my machine, I get a segmentation fault and a crash. You can eliminate this hole by adding these lines near the top of the routine:
if (str == NULL) {
return -1;
}
You may have taken an intentional decision not to support this case, however it could be that you didn't notice because your test cases stop when they encounter a NULL
as the test case:
for (i=0; test[i].string != NULL; ++i) {
If you want to support it, then a better running condition for your tests might be:
for (i=0; i < sizeof(test)/sizeof(test[0]); ++i) {
Maintainability
I had a similar concern to @mkrieger about the maintainability of the approach if you wanted to expand the state machine. On reflection however, you could simply move any nodes that you wanted to expand to the end of the list without having to update everything. This would create holes, but the memory impact would be small and your algorithm would cope with it. Where you're linking to a state multiple times however, I'd consider introducing a constant to represent the state. This would make the table easier to read + adapt. So, fsm[30]
could be fsm[DAY_VALID]
etc.
-
\$\begingroup\$ Concerning the
NULL
check: In general, checking forNULL
does have merit, yet this code's goal was "Given aconst char *
, search for a string conforming ...". As a string is an array andNULL
never points to a valid string, checking forNULL
is going beyond coding goals. Beware of creeping features. \$\endgroup\$chux– chux2016年12月24日 03:09:25 +00:00Commented Dec 24, 2016 at 3:09 -
\$\begingroup\$ @chux I don't think that is clear from the spec at all. It says 'search for a string' not 'search a string'. The reference at the end is better 'If the pattern exists within the string' and implies that a string is passed in, but it is not a clear statement of expectation. Given the inconsistencies and Edward's previous answer having referred to a null check for essentially the same function, it seemed like a valid concern, particularly since there is also a test case that implies null should return -1, although the case isn't executed. \$\endgroup\$forsvarir– forsvarir2016年12月24日 09:15:53 +00:00Commented Dec 24, 2016 at 9:15
graphviz
. \$\endgroup\$