I have a string which I want to check for characters other than those in this list {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"} however this check gets run several thousand times a second.
I am therefore looking for the most efficient way of performing this check, what I have tried so far:
If Not System.Text.RegularExpressions.Regex.IsMatch(Input, "[^0-9\.i]") Then
This is equivalent to the code below which I have also tried
Imports System
Imports System.Linq
Public Module Module1
Public Sub Main()
Console.WriteLine(IsValidString())
End Sub
Private Function IsValidString() As Integer
'Dim Input As String = "Hello World" 'This is Invalid
'Dim Input As String = "13.4i+4" 'This is Invalid
Dim Input As String = "13.4i" 'This is valid
If Function()
Dim IsValid As Boolean = True
For Each Character In Input
If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(Character) Then
IsValid = False
End If
Next
Return IsValid
End Function() Then
Return -1
Else
'Do some other stuff
Return 1 '
End If
End Function
End Module
I am looking for optimal performance possibly at the cost of readability as I am willing to write pages on how the code works and what it does if only I can speed up the execution of my app!
2 Answers 2
In this case (and many others), when it comes to performance, there's really nothing that beats the old trusty For Next and Select Case statements.
Private Function IsValid(input As String) As Boolean
Dim index As Integer
Dim length As Integer = (input.Length - 1)
For index = 0 To length
Select Case input.Chars(index)
Case "0"c, "1"c, "2"c, "3"c, "4"c, "5"c, "6"c, "7"c, "8"c, "9"c, "i"c, "."c
Continue For
Case Else
Return False
End Select
Next
Return True
End Function
I've run a test, as seen in this fiddle (Release build - Any CPU) with 10000 iterations and this is the result:
{ Name = IsValid1, Repetitions = 10000, Result = False, Time = 0,7359 }
{ Name = IsValid2, Repetitions = 10000, Result = False, Time = 58,787 }
{ Name = IsValid3, Repetitions = 10000, Result = False, Time = 106,5004 }{ Name = IsValid1, Repetitions = 10000, Result = True, Time = 0,4382 }
{ Name = IsValid2, Repetitions = 10000, Result = True, Time = 42,1742 }
{ Name = IsValid3, Repetitions = 10000, Result = True, Time = 65,9497 }
Private Function IsValid2(input As String) As Boolean
Dim validChars = {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}
Return input.All(Function(c) validChars.Contains(c))
End Function
Private Function IsValid3(input As String) As Boolean
Dim IsValid As Boolean = True
For Each Character In input
If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(Character) Then
IsValid = False
End If
Next
Return IsValid
End Function
-
1\$\begingroup\$ That's an impressive improvement! \$\endgroup\$RubberDuck– RubberDuck2015年09月26日 16:13:50 +00:00Commented Sep 26, 2015 at 16:13
-
1\$\begingroup\$ If anyone is interested in why this is an order of magnitude faster, it's because the switch statement eliminates an entire loop. (The one that's hidden by the call to
Contains
.) \$\endgroup\$RubberDuck– RubberDuck2015年09月26日 16:16:59 +00:00Commented Sep 26, 2015 at 16:16 -
\$\begingroup\$ @RubberDuck Yup, the IL jump table is way more faster :) \$\endgroup\$Bjørn-Roger Kringsjå– Bjørn-Roger Kringsjå2015年09月26日 16:38:19 +00:00Commented Sep 26, 2015 at 16:38
First we'll clean this up a bit, then we'll look at performance.
There is absolutely zero reason to be using an anonymous function. This is useful functionality that you will eventually want to call from somewhere else. As such, it deserves to be extracted into a proper method of it's own.
Private Function IsValidString() As Integer
'Dim Input As String = "Hello World" 'This is Invalid
'Dim Input As String = "13.4i+4" 'This is Invalid
Dim Input As String = "13.4i" 'This is valid
If IsValidString(Input) Then
Return -1
Else
'Do some other stuff
Return 1 '
End If
End Function
Function IsValidString(Input As String) As Boolean
Dim IsValid As Boolean = True
For Each Character In Input
If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(Character) Then
IsValid = False
End If
Next
Return IsValid
End Function
Next, let's forget entirely about the other IsValidString()
method for a second and focus on the code we want to focus on. Watch your casing. It's terribly confusing to make local variables PascalCased
just like your classes & methods. Local variables should be camelCased
.
Public Sub Main()
Console.WriteLine(IsValidString("Hello World"))
Console.WriteLine(IsValidString("13.4i+4"))
Console.WriteLine(IsValidString("13.4i"))
End Sub
Function IsValidString(input As String) As Boolean
Dim isValid As Boolean = True
For Each character In input
If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(character) Then
isValid = False
End If
Next
Return isValid
End Function
We could go one step farther and Linq-ifiy it a bit into some very readable code.
Function IsValidString(input As String) As Boolean
Dim validChars = {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}
Return input.All(Function(c) validChars.Contains(c))
End Function
But does it perform? Well, actually, yes it does. I wrote some benchmarking code and it turns out that the Linq version out performs yours by quite a bit. I was kind of surprised by this, but here are the results.
Sample Size: 100 Original; Mostly Invalid: 10ms Linq; MostlyInvalid: 0ms Original; Mostly Valid: 0ms Linq; Mostly Valid: 0ms Sample Size: 1000 Original; Mostly Invalid: 2ms Linq; MostlyInvalid: 1ms Original; Mostly Valid: 2ms Linq; Mostly Valid: 1ms Sample Size: 10000 Original; Mostly Invalid: 25ms Linq; MostlyInvalid: 4ms Original; Mostly Valid: 25ms Linq; Mostly Valid: 23ms Sample Size: 100000 Original; Mostly Invalid: 235ms Linq; MostlyInvalid: 41ms Original; Mostly Valid: 231ms Linq; Mostly Valid: 255ms Sample Size: 1000000 Original; Mostly Invalid: 2408ms Linq; MostlyInvalid: 587ms Original; Mostly Valid: 2347ms Linq; Mostly Valid: 2083ms Press any key to continue . . .
As you can see, the Linq version nearly always outperforms your method on mostly invalid input. However, on mostly valid input, it performs roughly the same; it's sometimes better, sometimes worse. I suspect this is because the Linq version bails out of the check earlier on invalid input.
Here's the final version of the benchmarking code. It exceed's .NetFiddle's memory limits, so you'll need to paste it into a C# console program to see the results of the higher sample sizes.
If you're either uncomfortable with Linq, or really looking to just push the performance to the absolute best it can be, then we can just modify your original algorithm to return early when we know that the answer is false.
Function IsValidString(input As String) As Boolean
For Each character In input
If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(character) Then
Return False
End If
Next
Return True
End Function
This outperforms the both version above in every case, mostly valid or mostly invalid and it does it consistently each time I run the updated benchmarks.
Sample Size: 1000 Original; Mostly Invalid: 2ms Linq; MostlyInvalid: 0ms Improved; Mostly Invalid: 0ms Original; Mostly Valid: 2ms Linq; Mostly Valid: 2ms Improved; Mostly Valid: 1ms Sample Size: 10000 Original; Mostly Invalid: 25ms Linq; MostlyInvalid: 4ms Improved; Mostly Invalid: 3ms Original; Mostly Valid: 22ms Linq; Mostly Valid: 20ms Improved; Mostly Valid: 17ms Sample Size: 100000 Original; Mostly Invalid: 242ms Linq; MostlyInvalid: 43ms Improved; Mostly Invalid: 33ms Original; Mostly Valid: 235ms Linq; Mostly Valid: 238ms Improved; Mostly Valid: 186ms Sample Size: 1000000 Original; Mostly Invalid: 2479ms Linq; MostlyInvalid: 609ms Improved; Mostly Invalid: 336ms Original; Mostly Valid: 2358ms Linq; Mostly Valid: 2059ms Improved; Mostly Valid: 1704ms
BUT........
Correctness
Is 0323.233.12.i
a valid string? Both your and my method says it is. Your regex also suffers from the same problem.
-
\$\begingroup\$ You should terminate the loop over the string immediately after finding an invalid char: isValid = False : Exit For. Or even: Return False. \$\endgroup\$user1016274– user10162742015年09月26日 13:49:40 +00:00Commented Sep 26, 2015 at 13:49
-
1\$\begingroup\$ Excellent answer, greatly appreciated! Also really good point about the multiple '.'s you are quite correct in that this is something I had overlooked. I have only been programming for two years, I still have a lot to learn! Thanks for your help! \$\endgroup\$o.comp– o.comp2015年09月26日 14:51:38 +00:00Commented Sep 26, 2015 at 14:51
-
\$\begingroup\$ You're very welcome @user4110980. I just updated the question one last time. You may be interested in it, even though the function needs to be corrected. \$\endgroup\$RubberDuck– RubberDuck2015年09月26日 15:17:29 +00:00Commented Sep 26, 2015 at 15:17
-
\$\begingroup\$ Because the strings that get passed to this method are generated by my program and not by user input I am relying on the exception throwing downstream as invalid 'input' values are very rare indeed. \$\endgroup\$o.comp– o.comp2015年09月26日 15:40:22 +00:00Commented Sep 26, 2015 at 15:40