I've been playing around with C# 7 and created this if-less and switch
-only command-line parser. What do you think of it? Can it be made more C# 7 or otherwise improved? Is the new switch
a good alternative to the old if/else
?
It splits a string by |=|:
(space, equal-sign, colon) into tokens (unless escaped) and removes quotes from quoted parts.
public static void Parse7(string text)
{
var escapableChars = new HashSet<char> { '\\', '"', '=', ':' };
var separators = new HashSet<char> { ' ', '=', ':' };
var tokens = new List<string>();
var token = new StringBuilder();
var escapeMode = false;
var quoted = false;
bool IsUnquotedSeparator(char c) => separators.Contains(c) && !quoted;
bool IsTokenEmpty() => token.Length == 0;
foreach (var c in text ?? throw new ArgumentNullException(nameof(text)))
{
switch (c)
{
case '\\' when !escapeMode:
escapeMode = true;
// Don't eat escape-char yet.
break;
case '"':
quoted = !quoted;
// Don't eat quotes.
break;
default:
switch (escapeMode)
{
case true:
switch (escapableChars.Contains(c))
{
case false:
// Eat escape-char because it doesn't escape anything.
token.Append('\\');
break;
}
// Eat escaped-char.
token.Append(c);
escapeMode = false;
break;
default:
switch (IsUnquotedSeparator(c))
{
case true when !IsTokenEmpty():
// Eat token.
tokens.Add(token.ToString());
token.Clear();
break;
case true:
// Don't eat separators.
break;
default:
// Eat any other char.
token.Append(c);
break;
}
break;
}
break;
}
}
switch (IsTokenEmpty())
{
case false:
// Eat the last token.
tokens.Add(token.ToString());
break;
}
tokens.Dump(); // LINQPad
}
Example:
var cmd = @"foo.exe --bar=""baz"" --qux --baar\=baaz --quux";
Parse7(cmd);
Result:
foo.exe --bar baz --qux --baar=baaz --quux
For comparison purposes this is the classic version:
public static void Parse(string text)
{
var escapableChars = new HashSet<char> { '\\', '"', '=', ':' };
var separators = new HashSet<char> { ' ', '=', ':' };
var tokens = new List<string>();
var token = new StringBuilder();
var escapeMode = false;
var quoted = false;
foreach (var c in text)
{
if (c == '\\' && !escapeMode)
{
escapeMode = true;
// Don't eat escape-char yet.
continue;
}
if (escapeMode)
{
if (escapableChars.Contains(c)) token.Append(c);
else token.Append('\\').Append(c);
escapeMode = false;
// Escape-char already eaten.
continue;
}
if (c == '"')
{
quoted = !quoted;
// Don't eat quotes.
continue;
}
if (separators.Contains(c) && !quoted)
{
tokens.Add(token.ToString());
token.Clear();
}
else
{
token.Append(c);
}
}
if (token.Length > 0) tokens.Add(token.ToString());
tokens.Dump(); // LINQPad
}
2 Answers 2
If vs switch
If-statements and switch-statements are both "selections" (control structure). Selections produce abstractions in the control flow as you loose the information "which way was selected" afterwards.
Using one or the other does not change the control flow. So after all both are either good or evil or anything beneath it. Depending on the programming language they may be different in the assertions they give when mapping a datastructure to a control flow. But if you know that a switch statement serves the same purpose as an if statement readability becomes a matter of habituation and subjectivity. That is why I say also "switch" when I say "if".
You may have new syntactical sugar to express "selections" that will be more compact but it remains a selection.
Abstraction
The only interesting thing we are able to change to a more flexible structure is the abstraction that will be represented as abstract classes or interfaces with their concrete implementations. If you make proper decisions for abstractions they become beneficial (e.g. readability).
Parsing
A combination of the interpreter pattern and the state pattern will provide a solid structure for parsing things from various sources. The state pattern will guide your control flow through different compilation units that can be independently arranged. The interpreter will gather the instructions from the "state machine" and executes whatever it should do.
To be fair: You will finally not have less if-statements and a lot more compilation units. But you will have a beneficial structure that decouples nested if-statements in an elegant way to be able to recompose the control flow. So you will get the flexibility to meet future parsing requirements when extending the interpreted language.
Here we have a beneficial structure that leads objectively to better readability and understandability as a concrete state focussed only one level of if-statements, is isolated from other states (other isolated if statements) and encapsulated. Your internal stack is less loaded to see and understand the movable parts within the state than an equivalent nested If-then-else cascade.
What you do with this is applying the single responsibility principle. A state is responsible to only decide to which state to go next residing in one level of abstraction that definitely is affecting readability.
References
Oracle for example provide state machine like diagrams to define the syntax of their SQL variant:
Another representation but effectively the same purpose PostgresSQL Documentation is following within the synopsis:
Another way to formalize the syntax in a developers style is using the UML notations for a state chart:
The base theory for all of this are:
A very simple schema to get an idea how to implement it I provided here:
-
\$\begingroup\$ These are very interesting points, especially the interpreter and state patterns. I'll need some time to study them and I'll try to implement them in the command-line parser. \$\endgroup\$t3chb0t– t3chb0t2017年02月18日 08:43:21 +00:00Commented Feb 18, 2017 at 8:43
A small suggestion, but your switch
statement for booleans
seems a bit overkill (and in most cases quite hard to read)
Consider your code block here on how you are using the switch statement for a true/false:
switch (escapableChars.Contains(c)) { case false: // Eat escape-char because it doesn't escape anything. token.Append('\\'); break; } // Eat escaped-char. token.Append(c); escapeMode = false;
This would be easier to read if you used an if
statement
if(!escapableChars.Contains(c)) token.Append('\\');
// Eat escaped-char.
token.Append(c);
escapeMode = false;
Explore related questions
See similar questions with these tags.
using
statement was shown as inactive and unreachable. \$\endgroup\$