8
\$\begingroup\$

I have a string which I want to check for characters other than those in this list {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"} however this check gets run several thousand times a second.

I am therefore looking for the most efficient way of performing this check, what I have tried so far:

If Not System.Text.RegularExpressions.Regex.IsMatch(Input, "[^0-9\.i]") Then

This is equivalent to the code below which I have also tried

Imports System
Imports System.Linq
Public Module Module1
 Public Sub Main()
 Console.WriteLine(IsValidString())
 End Sub
 Private Function IsValidString() As Integer
 'Dim Input As String = "Hello World" 'This is Invalid
 'Dim Input As String = "13.4i+4" 'This is Invalid
 Dim Input As String = "13.4i" 'This is valid
 If Function()
 Dim IsValid As Boolean = True
 For Each Character In Input
 If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(Character) Then
 IsValid = False
 End If
 Next
 Return IsValid
 End Function() Then
 Return -1
 Else
 'Do some other stuff
 Return 1 '
 End If
 End Function
End Module

I am looking for optimal performance possibly at the cost of readability as I am willing to write pages on how the code works and what it does if only I can speed up the execution of my app!

asked Sep 26, 2015 at 11:01
\$\endgroup\$
0

2 Answers 2

5
\$\begingroup\$

In this case (and many others), when it comes to performance, there's really nothing that beats the old trusty For Next and Select Case statements.

Private Function IsValid(input As String) As Boolean
 Dim index As Integer
 Dim length As Integer = (input.Length - 1)
 For index = 0 To length
 Select Case input.Chars(index)
 Case "0"c, "1"c, "2"c, "3"c, "4"c, "5"c, "6"c, "7"c, "8"c, "9"c, "i"c, "."c
 Continue For
 Case Else
 Return False
 End Select
 Next
 Return True
End Function

I've run a test, as seen in this fiddle (Release build - Any CPU) with 10000 iterations and this is the result:

{ Name = IsValid1, Repetitions = 10000, Result = False, Time = 0,7359 }
{ Name = IsValid2, Repetitions = 10000, Result = False, Time = 58,787 }
{ Name = IsValid3, Repetitions = 10000, Result = False, Time = 106,5004 }

{ Name = IsValid1, Repetitions = 10000, Result = True, Time = 0,4382 }
{ Name = IsValid2, Repetitions = 10000, Result = True, Time = 42,1742 }
{ Name = IsValid3, Repetitions = 10000, Result = True, Time = 65,9497 }

Private Function IsValid2(input As String) As Boolean
 Dim validChars = {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}
 Return input.All(Function(c) validChars.Contains(c))
End Function

Private Function IsValid3(input As String) As Boolean
 Dim IsValid As Boolean = True
 For Each Character In input
 If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(Character) Then
 IsValid = False
 End If
 Next
 Return IsValid
End Function
answered Sep 26, 2015 at 15:50
\$\endgroup\$
3
  • 1
    \$\begingroup\$ That's an impressive improvement! \$\endgroup\$ Commented Sep 26, 2015 at 16:13
  • 1
    \$\begingroup\$ If anyone is interested in why this is an order of magnitude faster, it's because the switch statement eliminates an entire loop. (The one that's hidden by the call to Contains.) \$\endgroup\$ Commented Sep 26, 2015 at 16:16
  • \$\begingroup\$ @RubberDuck Yup, the IL jump table is way more faster :) \$\endgroup\$ Commented Sep 26, 2015 at 16:38
6
\$\begingroup\$

First we'll clean this up a bit, then we'll look at performance.

There is absolutely zero reason to be using an anonymous function. This is useful functionality that you will eventually want to call from somewhere else. As such, it deserves to be extracted into a proper method of it's own.

Private Function IsValidString() As Integer
 'Dim Input As String = "Hello World" 'This is Invalid
 'Dim Input As String = "13.4i+4" 'This is Invalid
 Dim Input As String = "13.4i" 'This is valid
 If IsValidString(Input) Then
 Return -1
 Else
 'Do some other stuff
 Return 1 '
 End If
End Function
Function IsValidString(Input As String) As Boolean
 Dim IsValid As Boolean = True
 For Each Character In Input
 If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(Character) Then
 IsValid = False
 End If
 Next
 Return IsValid
End Function

Next, let's forget entirely about the other IsValidString() method for a second and focus on the code we want to focus on. Watch your casing. It's terribly confusing to make local variables PascalCased just like your classes & methods. Local variables should be camelCased.

Public Sub Main()
 Console.WriteLine(IsValidString("Hello World"))
 Console.WriteLine(IsValidString("13.4i+4"))
 Console.WriteLine(IsValidString("13.4i"))
End Sub
Function IsValidString(input As String) As Boolean
 Dim isValid As Boolean = True
 For Each character In input
 If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(character) Then
 isValid = False
 End If
 Next
 Return isValid
End Function

We could go one step farther and Linq-ifiy it a bit into some very readable code.

Function IsValidString(input As String) As Boolean
 
 Dim validChars = {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}
 Return input.All(Function(c) validChars.Contains(c))
 
End Function

But does it perform? Well, actually, yes it does. I wrote some benchmarking code and it turns out that the Linq version out performs yours by quite a bit. I was kind of surprised by this, but here are the results.

Sample Size: 100
Original; Mostly Invalid: 10ms
Linq; MostlyInvalid: 0ms
Original; Mostly Valid: 0ms
Linq; Mostly Valid: 0ms
Sample Size: 1000
Original; Mostly Invalid: 2ms
Linq; MostlyInvalid: 1ms
Original; Mostly Valid: 2ms
Linq; Mostly Valid: 1ms
Sample Size: 10000
Original; Mostly Invalid: 25ms
Linq; MostlyInvalid: 4ms
Original; Mostly Valid: 25ms
Linq; Mostly Valid: 23ms
Sample Size: 100000
Original; Mostly Invalid: 235ms
Linq; MostlyInvalid: 41ms
Original; Mostly Valid: 231ms
Linq; Mostly Valid: 255ms
Sample Size: 1000000
Original; Mostly Invalid: 2408ms
Linq; MostlyInvalid: 587ms
Original; Mostly Valid: 2347ms
Linq; Mostly Valid: 2083ms
Press any key to continue . . .

As you can see, the Linq version nearly always outperforms your method on mostly invalid input. However, on mostly valid input, it performs roughly the same; it's sometimes better, sometimes worse. I suspect this is because the Linq version bails out of the check earlier on invalid input.

Here's the final version of the benchmarking code. It exceed's .NetFiddle's memory limits, so you'll need to paste it into a C# console program to see the results of the higher sample sizes.

If you're either uncomfortable with Linq, or really looking to just push the performance to the absolute best it can be, then we can just modify your original algorithm to return early when we know that the answer is false.

Function IsValidString(input As String) As Boolean
 For Each character In input
 If Not {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0", ".", "i"}.Contains(character) Then
 Return False
 End If
 Next
 Return True
End Function

This outperforms the both version above in every case, mostly valid or mostly invalid and it does it consistently each time I run the updated benchmarks.

Sample Size: 1000
Original; Mostly Invalid: 2ms
Linq; MostlyInvalid: 0ms
Improved; Mostly Invalid: 0ms
Original; Mostly Valid: 2ms
Linq; Mostly Valid: 2ms
Improved; Mostly Valid: 1ms
Sample Size: 10000
Original; Mostly Invalid: 25ms
Linq; MostlyInvalid: 4ms
Improved; Mostly Invalid: 3ms
Original; Mostly Valid: 22ms
Linq; Mostly Valid: 20ms
Improved; Mostly Valid: 17ms
Sample Size: 100000
Original; Mostly Invalid: 242ms
Linq; MostlyInvalid: 43ms
Improved; Mostly Invalid: 33ms
Original; Mostly Valid: 235ms
Linq; Mostly Valid: 238ms
Improved; Mostly Valid: 186ms
Sample Size: 1000000
Original; Mostly Invalid: 2479ms
Linq; MostlyInvalid: 609ms
Improved; Mostly Invalid: 336ms
Original; Mostly Valid: 2358ms
Linq; Mostly Valid: 2059ms
Improved; Mostly Valid: 1704ms

BUT........

Correctness

Is 0323.233.12.i a valid string? Both your and my method says it is. Your regex also suffers from the same problem.

answered Sep 26, 2015 at 12:23
\$\endgroup\$
4
  • \$\begingroup\$ You should terminate the loop over the string immediately after finding an invalid char: isValid = False : Exit For. Or even: Return False. \$\endgroup\$ Commented Sep 26, 2015 at 13:49
  • 1
    \$\begingroup\$ Excellent answer, greatly appreciated! Also really good point about the multiple '.'s you are quite correct in that this is something I had overlooked. I have only been programming for two years, I still have a lot to learn! Thanks for your help! \$\endgroup\$ Commented Sep 26, 2015 at 14:51
  • \$\begingroup\$ You're very welcome @user4110980. I just updated the question one last time. You may be interested in it, even though the function needs to be corrected. \$\endgroup\$ Commented Sep 26, 2015 at 15:17
  • \$\begingroup\$ Because the strings that get passed to this method are generated by my program and not by user input I am relying on the exception throwing downstream as invalid 'input' values are very rare indeed. \$\endgroup\$ Commented Sep 26, 2015 at 15:40

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.