Program Purpose
So, I have a naming convention for certain folders.
I want to take in a folder name, and determine if it conforms to the convention.
Naming Convention
The convention (case insensitive) can be as simple as
"Surname, Firstname"
It could be as complicated as
"Surname (meta), Firstname (meta) & Firstname (meta) ; Surname (meta), Firstname (meta) & Firstname (meta)"
It is broken down like so:
A name is made up of a
[Surname]
and 1 or 2[Firstnames]
.Each
[Surname]
and[Firstname]
can have an optional[ (metadata)]
after it.If there are 2
[Firstnames]
, they are separated by[ & ]
.A name can, optionally, have a second set of
[Surname]
&[Firstnames]
. Separated from the first set by[ ; ]
.
As part of a larger program, I have a class object which handles information relating to a folder.
When a folder name is passed to the class, it validates the naming convention. It currently does this via regex but I find regex to be an incredible source of bugs and un-maintainable code.
So, is there a better way?
Program Flow
Receive folder name
Copy folder name
Regex Match/Replace the copy with
vbNullString
If the copy is now
vbNullString
, the whole string matched and is valid
Validation Code
Private Sub AddNamesFromClientFolder(ByVal ClientFolderName As String)
'/ Copy folder name
'/ Replace copy's regex matching with null string
'/ If the copy is now a null string, the whole name matched and is valid
'/ Client Folder names should be of the form:
'/ "[Surname] ( [misc] ), [Firstname] ( [Misc] ) & [Firstname] ( [Misc] ) ; [Other Surname] ( [Misc] ), [Other Firstname] ( [Misc] ) & [Other Firstname] ( [Misc] )"
'/
'/ With minimum form:
'/ "[Surname], [Firstname]"
Dim IsValid As Boolean
If Len(ClientFolderName) > 0 Then
Dim validationRegex As RegExp
Set validationRegex = New RegExp
With validationRegex
.Global = True
.IgnoreCase = False
.MultiLine = True
.Pattern = ClientFolderValidationRegex
End With
Dim testString As String
testString = ClientFolderName
testString = validationRegex.Replace(testString, vbNullString)
IsValid = (testString = vbNullString)
this.IsValid = IsValid
Else
this.IsValid = False
End If
End Sub
Building the regex pattern
Public Function ClientFolderValidationRegex() As String
'/ CG = "Capture Group"
Const L_CASE_LETTERS As String = "a-z"
Const U_CASE_LETTERS As String = "A-Z"
Const ALL_NUMBERS As String = "0-9"
Const NAME_PUNCTUATION As String = "`'!@\-_"
Const ALL_ALLOWED_PUNCTUATION As String = "`!""£$%^&*\-_+=\[\]{}:;@'~#<,>.?\/\\ "
Dim delim As String
'/ captures a single, contiguous group of letters/numbers/limited name punctuation e.g. "O'Malley"
Dim nameCG As String
nameCG = "([" & L_CASE_LETTERS & U_CASE_LETTERS & ALL_NUMBERS & NAME_PUNCTUATION & "]+)"
'/ captures the following: " (anything you want in here)"
Dim bracketedCG As String
bracketedCG = "( \(" & "([" & L_CASE_LETTERS & U_CASE_LETTERS & ALL_NUMBERS & ALL_ALLOWED_PUNCTUATION & "]+)" & "\))"
'/ Captures the following: "name (anything you want)" where " (anything you want)" may or may not be present
Dim nameSectionCG As String
nameSectionCG = "(" & nameCG & bracketedCG & "?" & ")"
'/ Surname portion of a filename should be the same as standard name section
Dim surnameCG As String
surnameCG = nameSectionCG
'/ Firstname portion might have an optional " & [name section]"
delim = " & "
Dim firstnameCG As String
firstnameCG = "(" & nameSectionCG & "(" & delim & nameSectionCG & ")?" & ")"
'/ Full name section of a filename is "[surname section], [firstname section]"
delim = ", "
Dim fullNamesCG As String
fullNamesCG = "(" & surnameCG & delim & firstnameCG & ")"
'/ Full filename might optionally have another " ; [full name section]"
delim = " ; "
Dim fullFilenameCG As String
fullFilenameCG = "(" & fullNamesCG & "(" & delim & fullNamesCG & ")?" & ")"
ClientFolderValidationRegex = fullFilenameCG
End Function
Regex Matching Examples:
Match:
Lannister, Tyrion
Lannister, Cersei (& Joffrey, Myrcella, Tommen {All Deceased})
Stark, Eddard (Ned, Deceased) ; Tully, Catelyn (Also Deceased)
No Match:
Tyrion Lannister
Lannister, Queen Cersei
Stark, Ned RED WEDDING
2 Answers 2
Why are you using regex for this? It makes no sense to match complicated naming conventions with something as low-level as Regular Expressions.
You either have to go higher in your abstraction by using a proper Grammar (which is basically what your separated groups do, but better) or go lower to forego the semantics you impose here.
Consider the following pseudo-ish code:
Dim isValid As Boolean
Dim Names(2) As String
isValid = True
Names = ClientFolderName.Split(" ; ")
For Each Name As String In Names
isValid = isValid And IsValidName(Name)
Next
This drops the first barrier that's overcomplicating your regex: the fact that it may contain two things.
Since you're only interested in validity you can use the following Function to check the validity of it's sub-parts
Function IsValidName(Name As String) As Boolean
Dim NameParts(2) As String
Dim result As Boolean
result = True
NameParts = Name.Split(", ")
result = result And IsValidSurname(NameParts(1))
result = result And IsValidFirstname(NameParts(2))
IsValidName = result
End Function
IsValidSurname
and IsValidFirstname
are significantly easier to implement and understand with regex than trying to instantly validate the whole thing with Regex. In addition to being much more maintainable you also get to have separate responsibilities as a bonus.
Follow great advice from someone who's come ages before us:
Divide and Conquer
This technique does not only apply in military, it's also a hugely useful and important skill during software development.
Have you ever heard about tussenvoegsels? They're parts of people's names. Well, in the Netherlands anyway. When used for authors, it's usually done as "van Surname, FirstName". Your regex doesn't support this, instead only accepting the last word of the surname. You should allow surnames to consist of multiple words.
Dim IsValid As Boolean
If Len(ClientFolderName) > 0 Then
Dim validationRegex As RegExp
Set validationRegex = New RegExp
With validationRegex
.Global = True
.IgnoreCase = False
.MultiLine = True
.Pattern = ClientFolderValidationRegex
End With
Dim testString As String
testString = ClientFolderName
testString = validationRegex.Replace(testString, vbNullString)
IsValid = (testString = vbNullString)
this.IsValid = IsValid
Else
this.IsValid = False
End If
What's the purpose of IsValid
here, if you're just going to nearly-directly write to this.IsValid
anyway? Why not do it like this?
If Len(ClientFolderName) > 0 Then
Dim validationRegex As RegExp
Set validationRegex = New RegExp
With validationRegex
.Global = True
.IgnoreCase = False
.MultiLine = True
.Pattern = ClientFolderValidationRegex
End With
Dim testString As String
testString = ClientFolderName
testString = validationRegex.Replace(testString, vbNullString)
this.IsValid = (testString = vbNullString)
Else
this.IsValid = False
End If
-
3\$\begingroup\$ I have one of those last names (but not with "van") and this is a huge problem for so many systems. That includes some really important systems, like the one at the DMV (and now the last name on my driver's licence does not match the one on my birth certificate). \$\endgroup\$Laurel– Laurel2016年07月29日 17:37:19 +00:00Commented Jul 29, 2016 at 17:37
Explore related questions
See similar questions with these tags.
van Houten, Milhouse
andGödel, Kurt
should not match? \$\endgroup\$