From what I've seen in other posts, if I actually know the unwanted characters, then I can do string.replace()
. But in my case, any characters can appear in my input string, and I only want to keep the characters that I want (without messing up the order of course).
private string RemoveUnwantedChar(string input)
{
string correctString = "";
for (int i = 0; i < input.Length; i++)
{
if (char.IsDigit(input[i]) || input[i] == '.' || input[i] == '-' || input[i] == 'n'
|| input[i] == 'u' || input[i] == 'm' || input[i] == 'k' || input[i] == 'M'
|| input[i] == 'G' || input[i] == 'H' || input[i] == 'z' || input[i] == 'V'
|| input[i] == 's' || input[i] == '%')
correctString += input[i];
}
return correctString;
}
Characters that I want:
- 0123456789
- numkMGHzVs%-
How can I tidy this code to be neater and more readable?
3 Answers 3
By having a
const string
which contains all of your wanted chars, you could do either a simple call toContains()
or check ifIndexOf()
will return a value> -1
.using string concatenation in a loop is mostly a bad idea. Use a
StringBuilder
instead.omitting braces
{}
although they are optional for single linedif
statements is a bad idea because it makes your code error prone.
Implementing the mentioned points will lead to
private const string allowedCharacters = "numkMGHzVs%-.";
private string RemoveUnwantedChar(string input)
{
StringBuilder builder = new StringBuilder(input.Length);
for (int i = 0; i < input.Length; i++)
{
if (char.IsDigit(input[i]) || allowedCharacters.Contains(input[i]))
{
builder.Append(input[i]);
}
}
return builder.ToString();
}
@Caricorc made a good suggestion in the comments
In my opinion
allowedCharacters
should be an argument to the function to allow reusability.
So by passing the allowedCharacters
as an optional parameter with an additional check with IsNullOrEmpty()
.
If performance is an issue, you could also pass a HashSet<char>
to the method or have an overloaded method like so
private string RemoveUnwantedChar(string input, string allowedCharacters = "0123456789numkMGHzVs%-.")
{
if (string.IsNullOrEmpty(allowedCharacters)) { return input; }
return RemoveUnwantedChar(input, new HashSet<char>(allowedCharacters));
}
private string RemoveUnwantedChar(string input, HashSet<char> allowedCharacters)
{
if (allowedCharacters.Count == 0) { return input; }
StringBuilder builder = new StringBuilder(input.Length);
for (int i = 0; i < input.Length; i++)
{
if (allowedCharacters.Contains(input[i]))
{
builder.Append(input[i]);
}
}
return builder.ToString();
}
you can reuse it somewhere else.
-
2\$\begingroup\$ In my opinion
allowedCharacters
should be an argument to the function to allow reusability. \$\endgroup\$Caridorc– Caridorc2015年11月02日 10:55:03 +00:00Commented Nov 2, 2015 at 10:55 -
\$\begingroup\$ I think
char.IsDigit(input[i]) ||
is a problem in the second function, as the user may or may not allow digits \$\endgroup\$Caridorc– Caridorc2015年11月02日 11:02:13 +00:00Commented Nov 2, 2015 at 11:02 -
\$\begingroup\$ Perfect :) , my last suggestion is converting the allowedchars to a set to reduce time complexity from n^2 to n. (Actually I should have posted all of this as an aswer but now it is too late to convert it) \$\endgroup\$Caridorc– Caridorc2015年11月02日 11:06:46 +00:00Commented Nov 2, 2015 at 11:06
-
2\$\begingroup\$ I feel that
RemoveUnwantedCharacters
should have ablacklist
parameter, not awhitelist
parameter. This really should beConformToWhitelist
or something - we're not removing unwanted characters so much as we're keeping wanted characters. A subtle difference, but a difference nevertheless. I would assumeRemoveUnwantedCharacters
would ask me for characters I don't want! \$\endgroup\$corsiKa– corsiKa2015年11月02日 18:35:22 +00:00Commented Nov 2, 2015 at 18:35 -
\$\begingroup\$ @corsiKa very good point actually! I did feel the same way after that, I now changed my method name to
FilterString
. \$\endgroup\$Liren Yeo– Liren Yeo2015年11月03日 05:14:52 +00:00Commented Nov 3, 2015 at 5:14
Whenever I see filtering, I tend to think LINQ.
private string RemoveUnwantedCharacters(string input, IEnumerable<char> allowedCharacters)
{
var filtered = input.ToCharArray()
.Where(c => allowedCharacters.Contains(c))
.ToArray();
return new String(filtered);
}
You can call it like this:
string filteredString = RemoveUnwantedCharacters(inputString, "0123456789numkMGHzVs%-.");
- Code is shorter
- The intent is clear - it basically reads as "filtered is input where allowed characters contains this character", which is pretty self-explanatory
- Allowed characters is a parameter, so you can reuse the method in various places. If you're using the same set of allowed characters in a lot of places, stick them in some sort of settings store.
-
1\$\begingroup\$ Using
AsEnumerable
on theinput
would be even moreLINQ
-ish ;-) \$\endgroup\$t3chb0t– t3chb0t2015年11月02日 13:07:41 +00:00Commented Nov 2, 2015 at 13:07 -
5\$\begingroup\$ 1) A string is already an
IEnumerable<char>
. BothAsEnumerable
andToCharArray
are useless here. 2) No need for a lambda. Just use.Where(allowedCharacters.Contains)
\$\endgroup\$CodesInChaos– CodesInChaos2015年11月02日 13:14:47 +00:00Commented Nov 2, 2015 at 13:14 -
2\$\begingroup\$ @CodesInChaos while
.Where(allowedCharacters.Contains)
works, it takes me much more mental effort to understand than.Where(c => allowedCharacters.Contains(c))
. I think the increased clarity is well worth the 8 extra keypresses. \$\endgroup\$JounceCracklePop– JounceCracklePop2015年11月02日 19:19:49 +00:00Commented Nov 2, 2015 at 19:19 -
1\$\begingroup\$ @CarlLeth - I think that's a personal preference thing; I much prefer the more concise version. \$\endgroup\$RobH– RobH2015年11月02日 20:44:22 +00:00Commented Nov 2, 2015 at 20:44
-
1\$\begingroup\$ @Liren same effect from opposite routes. The
.Contains()
is checking "does that collection contain this character" while the.Any()
is checking "are any of the characters in this collection the same as this one". They end up doing the same thing, but you're doing with.Where()
and.Any()
is to manually do what.Contains()
is actually intended for. \$\endgroup\$anaximander– anaximander2015年11月03日 10:05:42 +00:00Commented Nov 3, 2015 at 10:05
Same answer as on SO
Whenever you have to search for literals. Regex is the way to go.
public string RemoveUnwantedChar(string input) {
StringBuilder stringBuilder = new StringBuilder();
foreach (var match in Regex.Matches(input, "[0-9numkMGHzVs%\\-.]")) {
stringBuilder.Append(match.ToString());
}
return stringBuilder.ToString();
}
- Code is shorter
- Very easy to expand
- Easy to read
- Easy to follow the Code
Second Solution. A nice OneLiner as Taemyr suggested:
public string RemoveUnwantedChar(string input) {
return Regex.Replace(input, "[^0-9numkMGHzVs%\\-.]", "");
}
// Edit from String concatenation to StringBuilder implementation for better Performace especially for large inputs
// Edit2 Escaped the Dash for more Info: https://stackoverflow.com/questions/9589074/regex-should-hyphens-be-escaped
-
2\$\begingroup\$ You have presented an alternative solution, but haven't reviewed the code. Please explain your reasoning (how your solution works and how it improves upon the original) so that the author can learn from your thought process. \$\endgroup\$SuperBiasedMan– SuperBiasedMan2015年11月02日 13:03:16 +00:00Commented Nov 2, 2015 at 13:03
-
\$\begingroup\$ String concatenation in a loop, while present in the original code, is an anti-pattern (negative impact on performance, possibly a problem for large inputs) \$\endgroup\$Konrad Morawski– Konrad Morawski2015年11月02日 13:59:52 +00:00Commented Nov 2, 2015 at 13:59
-
1\$\begingroup\$ First; you need to escape the dash. Second; your function body can be replaced with
return Regex.Replace(input,"[^0-9numkMGHzVs%\\-.]","")
\$\endgroup\$Taemyr– Taemyr2015年11月02日 15:04:50 +00:00Commented Nov 2, 2015 at 15:04 -
\$\begingroup\$ You can alternatively put the hyphen at the beginning or end of the character class. \$\endgroup\$hjpotter92– hjpotter922015年11月03日 05:52:00 +00:00Commented Nov 3, 2015 at 5:52
IsDigit
accepts any unicode digit, not just the ASCII0
to9
. \$\endgroup\$IsDigit
checks for 0-9, whileIsNumber
checks for even more, like subscripts and fractions? \$\endgroup\$