14
\$\begingroup\$

Program Purpose

So, I have a naming convention for certain folders.

I want to take in a folder name, and determine if it conforms to the convention.


Naming Convention

The convention (case insensitive) can be as simple as

"Surname, Firstname"

It could be as complicated as

"Surname (meta), Firstname (meta) & Firstname (meta) ; Surname (meta), Firstname (meta) & Firstname (meta)"


It is broken down like so:

  • A name is made up of a [Surname] and 1 or 2 [Firstnames].

  • Each [Surname] and [Firstname] can have an optional [ (metadata)] after it.

  • If there are 2 [Firstnames], they are separated by [ & ].

  • A name can, optionally, have a second set of [Surname] & [Firstnames]. Separated from the first set by [ ; ].


As part of a larger program, I have a class object which handles information relating to a folder.

When a folder name is passed to the class, it validates the naming convention. It currently does this via regex but I find regex to be an incredible source of bugs and un-maintainable code.

So, is there a better way?


Program Flow

  1. Receive folder name

  2. Copy folder name

  3. Regex Match/Replace the copy with vbNullString

  4. If the copy is now vbNullString, the whole string matched and is valid


Validation Code

Private Sub AddNamesFromClientFolder(ByVal ClientFolderName As String)
 '/ Copy folder name
 '/ Replace copy's regex matching with null string
 '/ If the copy is now a null string, the whole name matched and is valid
 '/ Client Folder names should be of the form:
 '/ "[Surname] ( [misc] ), [Firstname] ( [Misc] ) & [Firstname] ( [Misc] ) ; [Other Surname] ( [Misc] ), [Other Firstname] ( [Misc] ) & [Other Firstname] ( [Misc] )"
 '/
 '/ With minimum form:
 '/ "[Surname], [Firstname]"
 Dim IsValid As Boolean
 If Len(ClientFolderName) > 0 Then
 Dim validationRegex As RegExp
 Set validationRegex = New RegExp
 With validationRegex
 .Global = True
 .IgnoreCase = False
 .MultiLine = True
 .Pattern = ClientFolderValidationRegex
 End With
 Dim testString As String
 testString = ClientFolderName
 testString = validationRegex.Replace(testString, vbNullString)
 IsValid = (testString = vbNullString)
 this.IsValid = IsValid
 Else
 this.IsValid = False
 End If
End Sub

Building the regex pattern

Public Function ClientFolderValidationRegex() As String
 '/ CG = "Capture Group"
 Const L_CASE_LETTERS As String = "a-z"
 Const U_CASE_LETTERS As String = "A-Z"
 Const ALL_NUMBERS As String = "0-9"
 Const NAME_PUNCTUATION As String = "`'!@\-_"
 Const ALL_ALLOWED_PUNCTUATION As String = "`!""£$%^&*\-_+=\[\]{}:;@'~#<,>.?\/\\ "
 Dim delim As String
 '/ captures a single, contiguous group of letters/numbers/limited name punctuation e.g. "O'Malley"
 Dim nameCG As String
 nameCG = "([" & L_CASE_LETTERS & U_CASE_LETTERS & ALL_NUMBERS & NAME_PUNCTUATION & "]+)"
 '/ captures the following: " (anything you want in here)"
 Dim bracketedCG As String
 bracketedCG = "( \(" & "([" & L_CASE_LETTERS & U_CASE_LETTERS & ALL_NUMBERS & ALL_ALLOWED_PUNCTUATION & "]+)" & "\))"
 '/ Captures the following: "name (anything you want)" where " (anything you want)" may or may not be present
 Dim nameSectionCG As String
 nameSectionCG = "(" & nameCG & bracketedCG & "?" & ")"
 '/ Surname portion of a filename should be the same as standard name section
 Dim surnameCG As String
 surnameCG = nameSectionCG
 '/ Firstname portion might have an optional " & [name section]"
 delim = " & "
 Dim firstnameCG As String
 firstnameCG = "(" & nameSectionCG & "(" & delim & nameSectionCG & ")?" & ")"
 '/ Full name section of a filename is "[surname section], [firstname section]"
 delim = ", "
 Dim fullNamesCG As String
 fullNamesCG = "(" & surnameCG & delim & firstnameCG & ")"
 '/ Full filename might optionally have another " ; [full name section]"
 delim = " ; "
 Dim fullFilenameCG As String
 fullFilenameCG = "(" & fullNamesCG & "(" & delim & fullNamesCG & ")?" & ")"
 ClientFolderValidationRegex = fullFilenameCG
End Function

Link to regex101

Regex Matching Examples:

Match:

Lannister, Tyrion

Lannister, Cersei (& Joffrey, Myrcella, Tommen {All Deceased})

Stark, Eddard (Ned, Deceased) ; Tully, Catelyn (Also Deceased)

No Match:

Tyrion Lannister

Lannister, Queen Cersei

Stark, Ned RED WEDDING

asked Jul 29, 2016 at 14:32
\$\endgroup\$
6
  • \$\begingroup\$ Can you post a set of example filenames that illustrate your pattern, examples both positive (matching) and negative (close but no cigar)? \$\endgroup\$ Commented Jul 29, 2016 at 15:11
  • 3
    \$\begingroup\$ Related: kalzumeus.com/2010/06/17/… \$\endgroup\$ Commented Jul 29, 2016 at 15:25
  • \$\begingroup\$ Don't have time to turn it into an answer, but if you're looking for an alternative to Regex, you could consider writing a state machine yourself. It's a common way of doing custom processing of strings (in fact regex expressions compile down to state machines anyway). \$\endgroup\$ Commented Jul 29, 2016 at 19:53
  • 1
    \$\begingroup\$ So van Houten, Milhouseand Gödel, Kurt should not match? \$\endgroup\$ Commented Jul 29, 2016 at 21:59
  • \$\begingroup\$ @HagenvonEitzen Please, don't blame me, blame the business logic. For better or worse, my entire industry operates on the assumption that people have one surname and one firstname and that both can be written using the standard english alphabet. \$\endgroup\$ Commented Jul 30, 2016 at 0:21

2 Answers 2

6
\$\begingroup\$

Why are you using regex for this? It makes no sense to match complicated naming conventions with something as low-level as Regular Expressions.

You either have to go higher in your abstraction by using a proper Grammar (which is basically what your separated groups do, but better) or go lower to forego the semantics you impose here.

Consider the following pseudo-ish code:

Dim isValid As Boolean
Dim Names(2) As String
isValid = True 
Names = ClientFolderName.Split(" ; ")
For Each Name As String In Names
 isValid = isValid And IsValidName(Name)
Next

This drops the first barrier that's overcomplicating your regex: the fact that it may contain two things.

Since you're only interested in validity you can use the following Function to check the validity of it's sub-parts

Function IsValidName(Name As String) As Boolean
 Dim NameParts(2) As String
 Dim result As Boolean
 result = True
 NameParts = Name.Split(", ")
 result = result And IsValidSurname(NameParts(1))
 result = result And IsValidFirstname(NameParts(2))
 IsValidName = result
End Function

IsValidSurname and IsValidFirstname are significantly easier to implement and understand with regex than trying to instantly validate the whole thing with Regex. In addition to being much more maintainable you also get to have separate responsibilities as a bonus.

Follow great advice from someone who's come ages before us:

Divide and Conquer

This technique does not only apply in military, it's also a hugely useful and important skill during software development.

answered Jul 29, 2016 at 15:19
\$\endgroup\$
0
7
\$\begingroup\$

Have you ever heard about tussenvoegsels? They're parts of people's names. Well, in the Netherlands anyway. When used for authors, it's usually done as "van Surname, FirstName". Your regex doesn't support this, instead only accepting the last word of the surname. You should allow surnames to consist of multiple words.


Dim IsValid As Boolean
If Len(ClientFolderName) > 0 Then
 Dim validationRegex As RegExp
 Set validationRegex = New RegExp
 With validationRegex
 .Global = True
 .IgnoreCase = False
 .MultiLine = True
 .Pattern = ClientFolderValidationRegex
 End With
 Dim testString As String
 testString = ClientFolderName
 testString = validationRegex.Replace(testString, vbNullString)
 IsValid = (testString = vbNullString)
 this.IsValid = IsValid
Else
 this.IsValid = False
End If

What's the purpose of IsValid here, if you're just going to nearly-directly write to this.IsValid anyway? Why not do it like this?

If Len(ClientFolderName) > 0 Then
 Dim validationRegex As RegExp
 Set validationRegex = New RegExp
 With validationRegex
 .Global = True
 .IgnoreCase = False
 .MultiLine = True
 .Pattern = ClientFolderValidationRegex
 End With
 Dim testString As String
 testString = ClientFolderName
 testString = validationRegex.Replace(testString, vbNullString)
 this.IsValid = (testString = vbNullString)
Else
 this.IsValid = False
End If
answered Jul 29, 2016 at 14:35
\$\endgroup\$
1
  • 3
    \$\begingroup\$ I have one of those last names (but not with "van") and this is a huge problem for so many systems. That includes some really important systems, like the one at the DMV (and now the last name on my driver's licence does not match the one on my birth certificate). \$\endgroup\$ Commented Jul 29, 2016 at 17:37

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.