I needed to parse some command line switches from a string and I tried to do it the easy way so I just wrote this:
static class CommandLineParser
{
public static IEnumerable<string> ParseCommandLine(this string text)
{
if (Regex.Matches(text, "\"").Count % 2 != 0)
{
throw new ArgumentException("Invalid number of qotes.");
}
return text.Aggregate(new
{
lastChar = '0円',
isEscaped = false,
result = new List<StringBuilder> { new StringBuilder() }
}, (state, next) =>
{
var isQuote = next == '"';
var isEscapedQuote = state.lastChar == '\\' && isQuote;
if (next == ' ' && !state.isEscaped)
{
// Ignore multiple spaces between switches.
if (state.lastChar != ' ')
{
state.result.Add(new StringBuilder());
}
}
else
{
state.result.Last().Append(next);
}
return new
{
lastChar = next,
isEscaped = isQuote && !isEscapedQuote ? !state.isEscaped : state.isEscaped,
result = state.result
};
}).result.Select(x => x.ToString());
}
}
Example:
var text = "foo -bar -baz=\"abc \\\"def\\\" ghi\" -qux";
text.ParseCommandLine().Dump();
and the result:
foo -bar -baz="abc \"def\" ghi" -qux
What do you think? Is this enough? Can this be even shorter?
2 Answers 2
Validate your text
argument. This causes an ArgumentNullException
in Regex.Matches
:
((string)null).ParseCommandLine();
It would be better to check the argument before passing it on:
if (text == null)
{
throw new ArgumentNullException(nameof(text), "Command line arguments cannot be null");
}
Why is this better? The stack trace is more meaningful because the error is thrown closer to where the problem call is and you can supply your own context specific error message. Either that, or you can return an empty array for null
.
Properties should be PascalCase - including anonymous types. lastChar
should be LastChar
and the same for the rest.
result
should be plural as it is a collection: Results
.
Use default()
to get the default value, it signals intent better than a literal ('0円' == default(char)
).
Name your constants! E.g. the double quote and the space. A single space is the most difficult thing to read in source code (in my experience).
I think this is good code (all the above is pretty minor). I'd never thought of using Aggregate
over a string for simple parsing before :)
-
\$\begingroup\$ I was really surprised when I found that this actully works because the anonymous object is readonly - I first wanted to modify a property but I wasn't allowed to do this - so I blindly just tried to return an exact instance and voila it worked ;-) I was afraid it wouldn't accept a new instance. \$\endgroup\$t3chb0t– t3chb0t2016年09月06日 08:37:59 +00:00Commented Sep 6, 2016 at 8:37
I think this is a misuse of Enumerable.Aggregate
just so you can avoid explicitly writing a loop. Aggregate
is eager and it's just a thin wrapper around a foreach
.
Because Aggregate
is eager you may as well use it for it's actual purpose and count the occurrences of quotes in the input. And you'll avoid a regex.
I would also make it explicit that isQuote && !isEscapedQuote
is an escape boundary.
There's a typo in your exception message, qotes -> quotes.
With changes applied:
static class CommandLineParser
{
const char quote = '"';
const char space = ' ';
const char backslash = '\\';
public static IEnumerable<string> ParseCommandLine(this string text)
{
var result = text.Aggregate(new
{
QuoteCount = 0,
LastChar = default(char),
IsEscaped = false,
Arguments = new List<StringBuilder> { new StringBuilder() },
}, (state, next) =>
{
var isQuote = next == quote;
var isEscapedQuote = state.LastChar == backslash && isQuote;
var isEscapeBoundary = isQuote && !isEscapedQuote;
if (next == space && !state.IsEscaped)
{
// Ignore multiple spaces between switches.
if (state.LastChar != space)
{
state.Arguments.Add(new StringBuilder());
}
}
else
{
state.Arguments.Last().Append(next);
}
return new
{
QuoteCount = next == quote ? state.QuoteCount + 1 : state.QuoteCount,
LastChar = next,
IsEscaped = isEscapeBoundary ? !state.IsEscaped : state.IsEscaped,
Arguments = state.Arguments,
};
});
if(result.QuoteCount % 2 != 0)
{
throw new ArgumentException("Invalid number of quotes.");
}
return result.Arguments.Select(a => a.ToString());
}
}
If you implemented the parsing with a loop instead you could make it lazy (or not) and easier to read:
public static class CommandLineParser
{
private const char space = ' ';
private const char quote = '"';
private const char backslash = '\\';
public static IEnumerable<string> ParseCommandLine(this string commandLine)
{
if(commandLine == null) throw new ArgumentNullException(nameof(commandLine));
if(commandLine.Count(c => c == quote) % 2 != 0) throw new ArgumentException("Invalid number of quotes.");
return ParseCommandLineImpl(commandLine);
}
private static IEnumerable<string> ParseCommandLineImpl(string commandLine)
{
var argumentBuilder = new StringBuilder();
var previous = default(char);
var isEscaped = false;
foreach(var current in commandLine)
{
if(current == space && !isEscaped)
{
if(previous != space)
{
yield return argumentBuilder.ToString();
argumentBuilder.Clear();
}
}
else
{
argumentBuilder.Append(current);
}
var isQuote = current == quote;
var isEscapedQuote = isQuote && previous == backslash;
var isEscapeBoundary = isQuote && !isEscapedQuote;
if(isEscapeBoundary)
{
isEscaped = !isEscaped;
}
previous = current;
}
if(argumentBuilder.Length > 0)
{
yield return argumentBuilder.ToString();
}
}
}
The method is split in two to force the argument validation to be eager and the parsing to be lazy.
To make it eager you just have to replace the yield return
with adding to a collection.
-
\$\begingroup\$ Those are resonable arguments. I must admit, your loop theory is correct ;-) I didn't want to use a loop and I wanted to see if I can misuse
Aggregate
for this purpose - I actually do it quite often. \$\endgroup\$t3chb0t– t3chb0t2016年09月06日 12:00:36 +00:00Commented Sep 6, 2016 at 12:00
te\xt
,tex"t
ortex\"t
? \$\endgroup\$te\xt
andtex\"t
, ortext
andtex"t
? (Or some other combination)? \$\endgroup\$