I have some code which is designed to scan columns F
and G
for occurrences of words found in an array, the array containing text found in column J
.
Column J
contains free text from a field in SAP. Being free text, it could be "Kerry John Pub Expenses" or "CATS O/H Kerry John", or even "CATS John Kerry O/H". There is no data entry standard for this field... and this is what makes this task difficult.
Example
Here we have 4 rows of data, "John Citizen" is located in row 3
, therefore the blank cells in columns F
and G
, row 2 can be populated with his first and last name.
enter image description here
enter image description here
I would like some suggestions as to how this code could be re-written to achieve this result more efficiently
Sub arraycolumnmatch()
Dim txtArray As Variant, T As Variant
Dim I As Long, J As Long
For I = 2 To Range("E50000").End(xlUp).row
typ = Range("F" & I).Value
If typ = "" Then
txt = Range("J" & I).Value
txtArray = Split(txt, " ")
For Each T In txtArray
For J = 2 To Range("G50000").End(xlUp).row
If Range("G" & J).Value = T Then
match_txt = T
Range("G" & I).Value = match_txt
Exit For
End If
Next J
Next T
For Each T In txtArray
For J = 2 To Range("F50000").End(xlUp).row
If Range("F" & J).Value = T Then
match_txt = T
'Check for surnames which could match first names such as Mary Michael
If Not Range("G" & I).Value = T Then
Range("F" & I).Value = match_txt
Exit For
End If
End If
Next J
Next T
End If
Next I
End Sub
-
\$\begingroup\$ Just to be clear, the code does what you want it to be doing, only too slow... right? \$\endgroup\$Mathieu Guindon– Mathieu Guindon2015年04月21日 02:53:34 +00:00Commented Apr 21, 2015 at 2:53
-
\$\begingroup\$ It is slow, but there is also another problem. Column J contains free text from a field in SAP, Column F and Column G contains first names and last names. The code makes an assumption, if it finds an entry in column F or G that matches an entry in the txt array, it will copy and paste that entry. My issue is that if it does it one at a time, it makes mistakes with names like Tony Andrew or Phillip Mary due to the fact that they are common first names \$\endgroup\$PootyToot– PootyToot2015年04月21日 03:59:45 +00:00Commented Apr 21, 2015 at 3:59
-
3\$\begingroup\$ You'll probably want to ask on Stack Overflow and get the code to work as intended first, then post the working code here to get it peer reviewed - until your code works as it should, it's not ready for Code Review :( \$\endgroup\$Mathieu Guindon– Mathieu Guindon2015年04月21日 04:14:04 +00:00Commented Apr 21, 2015 at 4:14
-
2\$\begingroup\$ I have updated it, it now functions correctly and solves my problem, but would like a review and see if it can be written to operate more efficiently. \$\endgroup\$PootyToot– PootyToot2015年04月22日 07:37:36 +00:00Commented Apr 22, 2015 at 7:37
2 Answers 2
Consistent Indentation.
The first thing I would address, is the indentation. Improving code that isn't properly formatted is just... not right.
Here are the signs:
Sub arraycolumnmatch()
Dim txtArray As Variant, T As Variant
Dim I As Long, J As Long
And:
Next I
End Sub
But also:
'Check for surnames which could match first names such as Mary Michael
If Not Range("G" & I).Value = T Then
The key is consistency - here's your code, with consistent indentation (comments omitted). Notice each code block (If...End If
, For...Next
, etc.) consistently adds an indentation level, and the end of the block lines up with its beginning - also notice that indentation levels are consistently 4 spaces wide; that's the default VBE setting for the Tab key:
Sub arraycolumnmatch()
|
| Dim txtArray As Variant, T As Variant
| Dim I As Long, J As Long
|
| For I = 2 To Range("E50000").End(xlUp).row
| | typ = Range("F" & I).Value
| | If typ = "" Then
| | | txt = Range("J" & I).Value
| | | txtArray = Split(txt, " ")
| | |
| | | For Each T In txtArray
| | | | For J = 2 To Range("G50000").End(xlUp).row
| | | | | If Range("G" & J).Value = T Then
| | | | | | match_txt = T
| | | | | | Range("G" & I).Value = match_txt
| | | | | | Exit For
| | | | | End If
| | | | Next J
| | | Next T
| | |
| | | For Each T In txtArray
| | | | For J = 2 To Range("F50000").End(xlUp).row
| | | | | If Range("F" & J).Value = T Then
| | | | | | match_txt = T
| | | | | | If Not Range("G" & I).Value = T Then
| | | | | | | Range("F" & I).Value = match_txt
| | | | | | | Exit For
| | | | | | End If
| | | | | End If
| | | | Next J
| | | Next T
| | End If
| Next I
|
End Sub
Much easier to follow, isn't it?
Meaningful Names.
Next thing I would address is the naming. Improving code where you have to quadruple-check every change you make because you're not sure you're changing the right thing is just not efficient. Using meaningful names for all identifiers fixes that.
Starting with the method's name itself:
Sub arraycolumnmatch()
Typically, VBA (VB in general) method names are PascalCase
, for better readability. That would make it ArrayColumnMatch
- but good method/procedure names should also start with a verb. What's this method doing exactly? should be answerable by simply looking at it's name.
The problem is, the method in question is doing too many things, so giving it a meaningful name is hard. But I'll get back to that.
Avoid chopping off identifers, like typ
when you meant type
- but then Type
is a reserved keyword.. the solution isn't to make it less readable! I'm not sure which worksheet Range
is referring to because Range
is an implicit reference to Application.ActiveSheet
(and that's very bug-prone!), so I'll just assume you meant to call it documentType
.
txt
is another meaningless name (heck, people use that as a prefix for TextBox
controls!): it stands for that manual entry field you're trying to parse, right? How about targetFieldValue
, or manualTextField
?
Also avoid single-letter identifiers, like T
- assuming txt
is manualTextField
, I'd go with manualTextFieldValue
.
Local variables are typically named in camelCase
, so I
and J
would be i
and j
- those are typically used for iterating loops as you're doing, so I'm not going to complain about those, except both loops seem to be iterating the very same rows on the active sheet (whatever that is), and they're nested, too... and since they're row numbers I'd typically name them xlRow[WhatItsFor]
.
Undeclared Identifiers
Always, systematically, consistently, automatically, thoughtlessly stick Option Explicit
at the top of every VBA module. Don't question it, just do it: without it, you can have a bug by simply having a little typo in an identifier name. With it, your VBA code will refuse to compile and run if an identifier isn't declared anywhere. Use it. Always.
If match_txt
is declared outside the scope of the procedure, then move the declaration inside that scope - it doesn't need to be at module scope. Always declare variables at the smallest possible scope; globals are evil.
vbNullString
A very minor point, but I have to mention it.
If typ = "" Then
Avoid using an empty string ""
to mean no value. VBA defines the constant vbNullString
specifically for that. And why should you bother typing vbNullString
when ""
does the same thing?
That's the thing. It doesn't do the same thing. ""
is a full-fledged String
value that needs its own allocated memory space. vbNullString
is a null string pointer that's not allocated anywhere. Of course it's a very minor performance hit to allocate that string, but semantically, vbNullString
is the right thing to do.
Don't believe me? Try this in the immediate pane (Ctrl+G):
?"Address for empty string: " & StrPtr("") & vbNewLine & "Address for vbNullString: " & StrPtr(vbNullString)
I get this output:
Address for empty string: 241665720
Address for vbNullString: 0
There's some kind of optimization going on under the hood though, so the pointer for ""
will be reused every time it's encountered in your application's lifetime - as I said it's a very minor point. But just as you wouldn't declare an unused variable, you shouldn't allocate a memory address for a built-in constant value.
Twin Loops
You have this:
For Each T In txtArray
And then a few lines further, you have that:
For Each T In txtArray
You're iterating all elements of txtArray
twice, when once would suffice. Also, wouldn't the last row of column "G" be the same last row as in column "F"? Wouldn't that last row be the same for every iteration of T
? I'm asking, because your data seems to be laid out in a table with records, so you'd only need to find the row for the last record, whichever column you're using for that is irrelevant, no?
For J = 2 To Range("G50000").End(xlUp).row
...
For J = 2 To Range("F50000").End(xlUp).row
I'd extract a variable here, and assign it to Range("G50000").End(xlUp).row
before the For Each T In txtArray
loop starts, and then do
For J = 2 To xlLastRow
Then you're writing to G[i]
if the value of G[j]
matches the current txtArray
element, and to F[i]
if G[i]
wasn't written to and the value of F[j]
matches the current txtArray
element.
Wouldn't that do the same thing then?
If Range("G" & j).Value = T Then
Range("G" & i).Value = T
Else If Range("F" & j).Value = T Then
Range("F" & i).Value = T
End If
Lastly, the J
loop looks like it could be implemented as a Find
instead of a loop.
Extract Methods.
The problem is, the method in question is doing too many things, so giving it a meaningful name is hard. But I'll get back to that.
So here we are. What's that method doing actually?
It...
- finds the row number for the last record on the active sheet
- iterates all records on the active sheet
- splits a value in the current record into an array of values
- iterates all values in that array
- looks for a match of that value somewhere in columns G and F
- writes that value to column G or F, whichever comes first
Extracting methods is a refactoring operation, and doing that manually in VBA is, honestly, a pain in the neck.
I've authored a refactoring tool that does exactly that (and more):
Rubberduck's Extract Method refactoring
Of course I have a little bit of a vested interest (very little - it's free and open-source actually), but I'd recommend you take a look at what the Rubberduck VBE add-in can do for you.
Speed it up #1: Minimize sheet access
Besides what @Mat'sMug said, there are a few other ways to speed things up.
In my experience, reading Excel cells from VBA is slow. You're iterating over a range several times. It probably would be faster to read those ranges into a VBA array and then loop over that instead:
Dim ColumnF As Variant, ColumnG As Variant
ColumnF = Range("F2:F" & xlLastRow).Value2
ColumnG = Range("G2:G" & xlLastRow).Value2
' or even
Dim ColumnF() As String, ColumnG() As String
ReDim ColumnF(xlLastRow)
Redim ColumnG(xlLastRow)
' and then fill each in once with a For/Next
' then refer to ColumnF() or ColumnG() instead of Range("F"&...)
Speed it up #2: Use the right data structure for the job
This is a key principle of software development. When you're repeatedly searching a set of items for a string, you don't want to store that list in an array (or an array-like structure like Excel cells) -- you want to store it in a hash table or dictionary. VBA didn't originally come with a Dictionary
, but Microsoft added one after the fact as part of the Microsoft Scripting Library. https://support.microsoft.com/en-us/kb/187234
You'll need to add a reference to "Microsoft Scripting Runtime".
So, instead of loading into an array at startup, load column G and column F into Dictionary
s. I also suggest using meaningful names.
Dim firstNames As Dictionary
Dim lastNames As Dictionary
For J = 2 To xlLastRow
firstNames.Add Key := Range("F" & j), 1
lastNames.Add Key := Range("G" & j), 1
Next J
Now you can check whether an entry is in each Dictionary
in a flash:
Dim lnf as Boolean, fnf as Boolean
lnf = False
fnf = False
For Each T In txtArray
If Not lnf Then
If lastNames.Exists(T) Then
match_txt = T
Range("G" & I).Value = match_txt
If fnf Then Exit For
lnf = True
End If
End If
If Not fnf Then
If firstNames.Exists(T) Then
match_txt = T
Range("F" & I).Value = match_txt
if lnf Then Exit For
fnf = True
End If
End If
Next T
Also for reference:
- https://fastexcel.wordpress.com/2012/07/10/comparing-two-lists-vba-udf-shootout-between-linear-search-binary-search-collection-and-dictionary/
- http://www.snb-vba.eu/VBA_Dictionary_en.html
- https://newtonexcelbach.wordpress.com/2013/08/29/arrays-vs-collections-vs-dictionary-objects-and-dictionary-help/