7
\$\begingroup\$

I have some code which is designed to scan columns F and G for occurrences of words found in an array, the array containing text found in column J.

Column J contains free text from a field in SAP. Being free text, it could be "Kerry John Pub Expenses" or "CATS O/H Kerry John", or even "CATS John Kerry O/H". There is no data entry standard for this field... and this is what makes this task difficult.

Example

Here we have 4 rows of data, "John Citizen" is located in row 3, therefore the blank cells in columns F and G, row 2 can be populated with his first and last name.

enter image description here

enter image description here

I would like some suggestions as to how this code could be re-written to achieve this result more efficiently

 Sub arraycolumnmatch()
 Dim txtArray As Variant, T As Variant
 Dim I As Long, J As Long
 For I = 2 To Range("E50000").End(xlUp).row
 typ = Range("F" & I).Value
 If typ = "" Then
 txt = Range("J" & I).Value
 txtArray = Split(txt, " ")
 For Each T In txtArray
 For J = 2 To Range("G50000").End(xlUp).row
 If Range("G" & J).Value = T Then
 match_txt = T
 Range("G" & I).Value = match_txt
 Exit For
 End If
 Next J
 Next T
 For Each T In txtArray
 For J = 2 To Range("F50000").End(xlUp).row
 If Range("F" & J).Value = T Then
 match_txt = T
 'Check for surnames which could match first names such as Mary Michael
 If Not Range("G" & I).Value = T Then
 Range("F" & I).Value = match_txt
 Exit For
 End If
 End If
 Next J
 Next T
 End If
Next I
End Sub
Mathieu Guindon
75.5k18 gold badges194 silver badges467 bronze badges
asked Apr 21, 2015 at 2:13
\$\endgroup\$
4
  • \$\begingroup\$ Just to be clear, the code does what you want it to be doing, only too slow... right? \$\endgroup\$ Commented Apr 21, 2015 at 2:53
  • \$\begingroup\$ It is slow, but there is also another problem. Column J contains free text from a field in SAP, Column F and Column G contains first names and last names. The code makes an assumption, if it finds an entry in column F or G that matches an entry in the txt array, it will copy and paste that entry. My issue is that if it does it one at a time, it makes mistakes with names like Tony Andrew or Phillip Mary due to the fact that they are common first names \$\endgroup\$ Commented Apr 21, 2015 at 3:59
  • 3
    \$\begingroup\$ You'll probably want to ask on Stack Overflow and get the code to work as intended first, then post the working code here to get it peer reviewed - until your code works as it should, it's not ready for Code Review :( \$\endgroup\$ Commented Apr 21, 2015 at 4:14
  • 2
    \$\begingroup\$ I have updated it, it now functions correctly and solves my problem, but would like a review and see if it can be written to operate more efficiently. \$\endgroup\$ Commented Apr 22, 2015 at 7:37

2 Answers 2

8
\$\begingroup\$

Consistent Indentation.

The first thing I would address, is the indentation. Improving code that isn't properly formatted is just... not right.

Here are the signs:

Sub arraycolumnmatch()
Dim txtArray As Variant, T As Variant
Dim I As Long, J As Long

And:

Next I
End Sub

But also:

 'Check for surnames which could match first names such as Mary Michael
 If Not Range("G" & I).Value = T Then

The key is consistency - here's your code, with consistent indentation (comments omitted). Notice each code block (If...End If, For...Next, etc.) consistently adds an indentation level, and the end of the block lines up with its beginning - also notice that indentation levels are consistently 4 spaces wide; that's the default VBE setting for the Tab key:

Sub arraycolumnmatch()
|
| Dim txtArray As Variant, T As Variant
| Dim I As Long, J As Long
|
| For I = 2 To Range("E50000").End(xlUp).row
| | typ = Range("F" & I).Value
| | If typ = "" Then
| | | txt = Range("J" & I).Value
| | | txtArray = Split(txt, " ")
| | |
| | | For Each T In txtArray
| | | | For J = 2 To Range("G50000").End(xlUp).row
| | | | | If Range("G" & J).Value = T Then
| | | | | | match_txt = T
| | | | | | Range("G" & I).Value = match_txt
| | | | | | Exit For
| | | | | End If
| | | | Next J
| | | Next T
| | |
| | | For Each T In txtArray
| | | | For J = 2 To Range("F50000").End(xlUp).row
| | | | | If Range("F" & J).Value = T Then
| | | | | | match_txt = T
| | | | | | If Not Range("G" & I).Value = T Then
| | | | | | | Range("F" & I).Value = match_txt
| | | | | | | Exit For
| | | | | | End If
| | | | | End If
| | | | Next J
| | | Next T
| | End If
| Next I
|
End Sub

Much easier to follow, isn't it?


Meaningful Names.

Next thing I would address is the naming. Improving code where you have to quadruple-check every change you make because you're not sure you're changing the right thing is just not efficient. Using meaningful names for all identifiers fixes that.

Starting with the method's name itself:

Sub arraycolumnmatch()

Typically, VBA (VB in general) method names are PascalCase, for better readability. That would make it ArrayColumnMatch - but good method/procedure names should also start with a verb. What's this method doing exactly? should be answerable by simply looking at it's name.

The problem is, the method in question is doing too many things, so giving it a meaningful name is hard. But I'll get back to that.

Avoid chopping off identifers, like typ when you meant type - but then Type is a reserved keyword.. the solution isn't to make it less readable! I'm not sure which worksheet Range is referring to because Range is an implicit reference to Application.ActiveSheet (and that's very bug-prone!), so I'll just assume you meant to call it documentType.

txt is another meaningless name (heck, people use that as a prefix for TextBox controls!): it stands for that manual entry field you're trying to parse, right? How about targetFieldValue, or manualTextField?

Also avoid single-letter identifiers, like T - assuming txt is manualTextField, I'd go with manualTextFieldValue.

Local variables are typically named in camelCase, so I and J would be i and j - those are typically used for iterating loops as you're doing, so I'm not going to complain about those, except both loops seem to be iterating the very same rows on the active sheet (whatever that is), and they're nested, too... and since they're row numbers I'd typically name them xlRow[WhatItsFor].


Undeclared Identifiers

Always, systematically, consistently, automatically, thoughtlessly stick Option Explicit at the top of every VBA module. Don't question it, just do it: without it, you can have a bug by simply having a little typo in an identifier name. With it, your VBA code will refuse to compile and run if an identifier isn't declared anywhere. Use it. Always.

If match_txt is declared outside the scope of the procedure, then move the declaration inside that scope - it doesn't need to be at module scope. Always declare variables at the smallest possible scope; globals are evil.


vbNullString

A very minor point, but I have to mention it.

If typ = "" Then

Avoid using an empty string "" to mean no value. VBA defines the constant vbNullString specifically for that. And why should you bother typing vbNullString when "" does the same thing?

That's the thing. It doesn't do the same thing. "" is a full-fledged String value that needs its own allocated memory space. vbNullString is a null string pointer that's not allocated anywhere. Of course it's a very minor performance hit to allocate that string, but semantically, vbNullString is the right thing to do.

Don't believe me? Try this in the immediate pane (Ctrl+G):

?"Address for empty string: " & StrPtr("") & vbNewLine & "Address for vbNullString: " & StrPtr(vbNullString)

I get this output:

Address for empty string: 241665720
Address for vbNullString: 0

There's some kind of optimization going on under the hood though, so the pointer for "" will be reused every time it's encountered in your application's lifetime - as I said it's a very minor point. But just as you wouldn't declare an unused variable, you shouldn't allocate a memory address for a built-in constant value.


Twin Loops

You have this:

For Each T In txtArray

And then a few lines further, you have that:

For Each T In txtArray

You're iterating all elements of txtArray twice, when once would suffice. Also, wouldn't the last row of column "G" be the same last row as in column "F"? Wouldn't that last row be the same for every iteration of T? I'm asking, because your data seems to be laid out in a table with records, so you'd only need to find the row for the last record, whichever column you're using for that is irrelevant, no?

For J = 2 To Range("G50000").End(xlUp).row
...
For J = 2 To Range("F50000").End(xlUp).row

I'd extract a variable here, and assign it to Range("G50000").End(xlUp).row before the For Each T In txtArray loop starts, and then do

For J = 2 To xlLastRow

Then you're writing to G[i] if the value of G[j] matches the current txtArray element, and to F[i] if G[i] wasn't written to and the value of F[j] matches the current txtArray element.

Wouldn't that do the same thing then?

If Range("G" & j).Value = T Then
 Range("G" & i).Value = T
Else If Range("F" & j).Value = T Then
 Range("F" & i).Value = T
End If

Lastly, the J loop looks like it could be implemented as a Find instead of a loop.


Extract Methods.

The problem is, the method in question is doing too many things, so giving it a meaningful name is hard. But I'll get back to that.

So here we are. What's that method doing actually?

It...

  • finds the row number for the last record on the active sheet
  • iterates all records on the active sheet
  • splits a value in the current record into an array of values
  • iterates all values in that array
  • looks for a match of that value somewhere in columns G and F
  • writes that value to column G or F, whichever comes first

Extracting methods is a refactoring operation, and doing that manually in VBA is, honestly, a pain in the neck.

I've authored a refactoring tool that does exactly that (and more):

Rubberduck's Extract Method refactoring

Of course I have a little bit of a vested interest (very little - it's free and open-source actually), but I'd recommend you take a look at what the Rubberduck VBE add-in can do for you.

answered Apr 23, 2015 at 15:30
\$\endgroup\$
4
\$\begingroup\$

Speed it up #1: Minimize sheet access

Besides what @Mat'sMug said, there are a few other ways to speed things up.

In my experience, reading Excel cells from VBA is slow. You're iterating over a range several times. It probably would be faster to read those ranges into a VBA array and then loop over that instead:

Dim ColumnF As Variant, ColumnG As Variant
ColumnF = Range("F2:F" & xlLastRow).Value2
ColumnG = Range("G2:G" & xlLastRow).Value2
' or even
Dim ColumnF() As String, ColumnG() As String
ReDim ColumnF(xlLastRow)
Redim ColumnG(xlLastRow)
' and then fill each in once with a For/Next
' then refer to ColumnF() or ColumnG() instead of Range("F"&...)

Speed it up #2: Use the right data structure for the job

This is a key principle of software development. When you're repeatedly searching a set of items for a string, you don't want to store that list in an array (or an array-like structure like Excel cells) -- you want to store it in a hash table or dictionary. VBA didn't originally come with a Dictionary, but Microsoft added one after the fact as part of the Microsoft Scripting Library. https://support.microsoft.com/en-us/kb/187234

You'll need to add a reference to "Microsoft Scripting Runtime".

So, instead of loading into an array at startup, load column G and column F into Dictionarys. I also suggest using meaningful names.

Dim firstNames As Dictionary
Dim lastNames As Dictionary
For J = 2 To xlLastRow
 firstNames.Add Key := Range("F" & j), 1
 lastNames.Add Key := Range("G" & j), 1
Next J

Now you can check whether an entry is in each Dictionary in a flash:

 Dim lnf as Boolean, fnf as Boolean
 lnf = False
 fnf = False
 For Each T In txtArray
 If Not lnf Then
 If lastNames.Exists(T) Then
 match_txt = T
 Range("G" & I).Value = match_txt
 If fnf Then Exit For
 lnf = True
 End If
 End If
 If Not fnf Then
 If firstNames.Exists(T) Then
 match_txt = T
 Range("F" & I).Value = match_txt
 if lnf Then Exit For
 fnf = True
 End If
 End If
 Next T

Also for reference:

answered Apr 24, 2015 at 13:37
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.