Objective: A lightweight shell that runs on a microcontroller (MSP430) and parses incoming data into a command and additional parameters.
Requirements:
- Support of quoting in parameters using
"
- Escaping special characters using
\
in parameters- Ending commands with
\r
or\n
- Separating parameters using or
\t
- Sending NACK followed by a line break in the event of an error
- Locking (not accepting new commands) the shell during processing
NOT required:
- Support for entering special characters with leading backslash (
\n
,015円
, ...)- Support for backspace
- Support for ending commands using
\r\n
Circumstances:
Though the microcontroller in use is a pretty powerful one, the shell shall run in a low power mode and only activate the main program once the command is fully read in. When an character is received an interrupt is triggered, parsing should be done in that ISR routine to leave the program in low power mode – therefore the implementation used needs to be fast and small.
I thought a finite state machine is good enough for my parsing requirements and this is what I ended up with:
/* [...] */
// Control characters.
#define ACK 0x06
#define NACK 0x15
// Limitations for the shell.
#define SHELL_MAX_PARAMETER_LENGTH 32
#define SHELL_MAX_PARAMETERS 6
#define SHELL_MAX_CMD_LENGTH 32
// Shell signals.
#define SHELL_SIGNAL_OK {USART_Send_Data(ACK); USART_Send_String("\r\n");}
#define SHELL_SIGNAL_ERROR {USART_Send_Data(NACK); USART_Send_String("\r\n");}
/* [...] */
// Global variables.
/**
* @var
* The command that shall be executed by the shell.
*/
char Cmd[SHELL_MAX_CMD_LENGTH] = "";
/**
* @var
* An array of arguments for Cmd. Not including the command itself.
*/
char Parameters[SHELL_MAX_PARAMETERS][SHELL_MAX_PARAMETER_LENGTH];
/**
* @var
* The current index in Parameters.
*/
int Parameters_i = 0;
/**
* @var
* Flag that signals that the program is busy and will not process new commands.
*/
int Busy = 1;
/**
* @var
* Flag that signals that a new command has been received by the shell.
*/
int NewCommand = 0;
void main(void)
{
/* [...] */
// Start listening for commands.
Busy = 0;
while(1) {
// Sleep until the shell wakes up the main program.
LPM3; // Enter low power mode.
// Check if there is a new command waiting for execution.
if(NewCommand) {
// Process the new command.
exec_command();
// Listen for new commands.
Busy = 0;
NewCommand = 0;
}
/* [...] */
}
}
/**
* Executes the current command.
*/
void exec_command()
{
#define CMD_IS(X) (strcasecmp(Cmd, X) == 0)
if (CMD_IS("help")|| CMD_IS("?"))
shell_help(); // Prints a helpage.
else if (CMD_IS("Acknowledge"))
SHELL_SIGNAL_OK
/* [...] (More commands) */
else
SHELL_SIGNAL_ERROR
#undef CMD_IS
}
/**
* Interrupt that gets triggered when a character is received on the UART.
*
* A finate state machine interprets the input on the serial interface and
* extracts command and parameters. Supports quoting using '"' and
* escaping special characters using '\' in parameters.
*/
#pragma vector=USCIAB0RX_VECTOR
__interrupt void USCIAB0RX_ISR(void)
{
enum TStates {
CMD,
PARAMETER,
PARAMETER_BACKSLASH,
PARAMETER_QUOTE,
PARAMETER_QUOTE_BACKSLASH
};
static enum TStates state = CMD;
static int cmd_i = 0, parameter_i = 0, parameters_i = 0;
// Save the current received character in current_char.
char current_char = USART_RxData_Reg;
fUSART_Receive_Set;
// Check that the program currently accepts new commands.
if (Busy) {
goto ERROR;
}
// Transitions.
//
// Decide into which state to change.
// Then jump to one of the actions listed below.
if (state == CMD) {
switch (current_char) {
case ' ':
case '\t':
state = PARAMETER;
goto FINISH;
case '\n':
case '\r':
state = CMD;
goto CALL;
default:
state = CMD;
goto NEXT_CMD_CHAR;
}
} else if (state == PARAMETER) {
switch (current_char) {
case '\\':
state = PARAMETER_BACKSLASH;
goto FINISH;
case '"':
state = PARAMETER_QUOTE;
goto FINISH;
case ' ':
case '\t':
state = PARAMETER;
goto NEXT_PARAMETER;
case '\n':
case '\r':
state = CMD;
goto CALL;
default:
state = PARAMETER;
goto NEXT_PARAMETER_CHAR;
}
} else if (state == PARAMETER_QUOTE) {
switch (current_char) {
case '"':
state = PARAMETER;
goto FINISH;
case '\\':
state = PARAMETER_QUOTE_BACKSLASH;
goto FINISH;
default:
state = PARAMETER_QUOTE;
goto NEXT_PARAMETER_CHAR;
}
}
// Do not interpret characters after a backslash,
// treat them as a normal character.
// Go back into the previous state.
else if (state == PARAMETER_QUOTE_BACKSLASH) {
state = PARAMETER_QUOTE;
goto NEXT_PARAMETER_CHAR;
} else if (state == PARAMETER_BACKSLASH) {
state = PARAMETER;
goto NEXT_PARAMETER_CHAR;
}
// Actions.
NEXT_CMD_CHAR:
if (cmd_i < SHELL_MAX_CMD_LENGTH) {
Cmd[cmd_i++] = current_char;
} else {
goto ERROR;
}
goto FINISH;
NEXT_PARAMETER:
if (parameters_i < SHELL_MAX_PARAMETERS) {
Parameters[parameters_i++][parameter_i] = '0円';
parameter_i = 0;
} else {
goto ERROR;
}
goto FINISH;
NEXT_PARAMETER_CHAR:
if (parameter_i < SHELL_MAX_PARAMETER_LENGTH) {
Parameters[parameters_i][parameter_i++] = current_char;
} else {
goto ERROR;
}
goto FINISH;
CALL:
// Copy current index of parameters.
Parameters_i = parameters_i;
// End strings.
Cmd[cmd_i] = '0円';
Parameters[parameters_i][parameter_i] = '0円';
// Reset counters.
parameter_i = 0;
cmd_i = 0;
parameters_i = 0;
// Wake up the main program and signal that their is a new command.
// Mark the program as busy.
LPM3_EXIT; // Exit low power mode.
NewCommand = 1;
Busy = 1;
goto FINISH;
ERROR:
// Reset counters and read in new commands.
state = CMD;
cmd_i = 0;
parameters_i = 0;
parameter_i = 0;
// Signal the error.
SHELL_SIGNAL_ERROR
goto FINISH;
FINISH:
return;
}
Which works but also looks pretty scary.
Do you have any suggestions about improving this code, or can you think of a smarter way? Since I'm not a native – remarks concerning grammar, spelling and the usage of English are welcome too.
Follow-Up
Based on the posted answers, I've restructured my code to use a table based mealy machine and asked for another review here.
2 Answers 2
Here are some things that may help you improve your code.
Simplify the buffer
The code currently has a 32-byte shell length, a maximum of 6 parameters, and a maximum parameter length of 32. This yields a maximum overall command length of 224 bytes (7 * 32). It seems to me that it would be much simpler to simply declare a contiguous buffer of 224 bytes rather than two separate buffers for command and parameters.
Avoid overly similar variable names
The interrupt code has both parameter_i
and parameters_i
. These only differ by one letter and it's not even the beginning or ending character, making it relatively difficult to spot the difference. The code has Parameters_i
and parameters_i
which differ only in capitalization. Better would be to ditch the _i
suffix (which just makes the variable name harder to read) and rename to parameter_count
and parameter_char
or something similar. There is also no need to use both local and global variables since main
is not running until the interpreter is complete anyway.
Keep tidy as your state machine progresses
The code currently waits to the end to add terminating NUL
characters to strings but that should be part of the state machine processing instead. For example, in the CMD
state, when any whitespace character is read, the terminating NUL
character should immediately be added before moving to the next state.
Use a switch
for the state machine
Rather than a long if ... else
statement, it would make more sense to use a switch
statement for the state machine. This also allows the code to use the default
case to handle the problem that the state machine variable does not refer to a valid state. Even if your code is perfect, hardware isn't and an energetic gamma particle could flip a bit in the RAM that holds the state variable, so it's good to gracefully handle such invalid states.
Don't separate actions from states using goto
Having a proliferation of goto
statements is usually a sign of bad design. Better would be to eliminate them entirely -- it makes the code easier to follow and less error-prone. Better would be to shorten the actions and put them inline rather than jumping to some other location to perform the action. Alternatively, have two separate state machines -- one for input state and one for actions.
Consolidate redundant variables
The variables NewCommand
and Busy
appear to be redundant. NewCommand
could be eliminated and Busy
used instead.
Factor out common code into subroutines
Several places use largely identical code which could be factored out into a routine instead. For example, you could use a subroutine (or at least a macro) to reset the state machine.
Consider implementing the state machine as a data structure
A somewhat more advanced technique is to implement the state machine as a structure. The input character advances the state, and depending on the state, some action ensues. This suggests a possible data structure consisting of state, input_char and action members. By expressing this all as a data structure, the code to implement the state machine can generally be made quite small and fast (desirable for an interrupt) and easy to understand and maintain on the part of the human programmer (desirable for any code).
There are several problems with this code:
- the many
goto
statements - did not clear the interrupt pending flag
- tries to do everything inside the interrupt
- did not check to assure a char was available before trying to input the char
Note: there are other details to handle, like the output of the command line prompt.
The setup()
function should:
- execute from the reset vector
- setup all I/O registers, memory segments, etc
- setup the
argc
andargv[]
parameters - invoke the
main()
function
The I/O interrupt should only capture the string of input characters into a 'raw' character buffer (I.E. no processing). When a newline (on a mac that is a linefeed) is received, it is time to trigger the next upper level of processing.
The main program should set everything up and enter an outer loop.
The outer loop should call a read()
function. When read()
returns, call the appropriate function, depending on 'cmd'.
The read()
function should:
- output a prompt
- call the
cook()
function - return the 'cooked' array of cmd and parameters
The cook function should:
- clear the cooked array of cmd and parameters
- set the low power mode
- unset indication of complete raw input line available
- loop
- enable the 'input' I/O interrupt (should not hurt to do this repeatedly)
- call 'wait' (generally this will be exited on each interrupt event)
- exit loop if complete 'raw' input line is available
- end loop
- disable the 'input' I/O interrupt
- cook the raw input line into an array of cmd and parameters
The interrupt()
function should, on each interrupt event append new char to end of 'raw' data. If the new char is '\r' then set indication of complete raw input line available, then clear interrupt pending flag.
-
\$\begingroup\$ Why should I check if there is a char available if the interrupt gets only triggered when there is a new char? \$\endgroup\$Gellweiler– Gellweiler2015年01月14日 22:32:52 +00:00Commented Jan 14, 2015 at 22:32