6
\$\begingroup\$

The below function is used to Remove Duplicate Lines per each cell.
It works without problem, but it is slow with a range of only one column and 17k rows.
Actually, (with the same range in addition to another two column) , I am using a different excellent code by @VBasic2008 Link which perform complex tasks than my function and it takes just 1.5 second to finish.
I do not know where is the bottleneck on my code and How to optimize it.
There is no problem to totally change my codes or provide a new one. In advance, pleased for all your help.

Option Explicit
Option Compare Text
 
Function RemoveDuplicateLines(ByVal Text As String, Optional delimiter As String = vbLf) As String
 
 Dim dictionary As Object
 Dim x, part
 
 Set dictionary = CreateObject("Scripting.Dictionary")
 dictionary.CompareMode = vbTextCompare
 For Each x In Split(Text, delimiter)
 part = Trim(x)
 If part <> "" And Not dictionary.Exists(part) Then
 dictionary.Add part, Nothing
 End If
 Next
 
 If dictionary.Count > 0 Then
 RemoveDuplicateLines = Join(dictionary.keys, delimiter)
 Else
 RemoveDuplicateLines = ""
 End If
 
 Set dictionary = Nothing
End Function
 
Sub Remove_Duplicate_Lines()
 With Application
 .Calculation = xlCalculationManual
 .ScreenUpdating = False
 .EnableEvents = False
 End With
 
On Error GoTo Errorhandler
 
 Dim ws As Worksheet: Set ws = sh2
 Dim crg As Range
 Set crg = ws.Range("O2:O" & ws.Cells(Rows.Count, "O").End(xlUp).Row) '#Column contains Combined URL
 
 Dim arr: arr = crg.Value2
 Dim i As Long
 For i = LBound(arr) To UBound(arr)
 arr(i, 1) = RemoveDuplicateLines(arr(i, 1))
 Next i
 crg.value = arr
 
Errorhandler:
 With Application
 .Calculation = xlCalculationAutomatic
 .ScreenUpdating = True
 .EnableEvents = True
 End With
End Sub
asked Aug 3, 2022 at 7:24
\$\endgroup\$
1
  • 1
    \$\begingroup\$ You should better define « slow », is it 5 sec, 30 sec, 5 mins ? \$\endgroup\$ Commented Aug 4, 2022 at 6:23

1 Answer 1

6
\$\begingroup\$

Binding

Late-binding is slower than early-binding. Add a reference to Microsoft Scripting Runtime:
ref

then change Dim dictionary As Object to Dim dict As Dictionary and Set dictionary = CreateObject("Scripting.Dictionary") to Set dict = New Dictionary. Note that these are not the final declarations but more on this below.

Declare variables

The line Dim x, part simply declares 2 variables of type Variant by default. Although x needs to be a Variant because it's used in a For Each... loop you should still declare it as a best practice. part however should be declared as String. Never use Variant if you already know the var type because the extra wrapping is using extra resources.

Also, a For... To... loop is slightly faster on a 1D array, compared to a For Each..., so really you don't need x to be Variant at all but rather an iterator declared as Long and an array of String() as that is returned by Split.

Efficiency

You are presuming that all values in the O column are strings. It's better to only run the RemoveDuplicateLines for strings only and not for anything else like blanks or numbers. So, use VarType to check for type.

If the number of individual lines returned by the call to Split is exactly the same as the number of keys in the dictionary (i.e. no duplicates) then there is no need to join the keys because the original string would already be satisfactory as the result. Same goes for trimming - if trim does not remove any character then we can use the original text as long as there were no duplicates either.

You could avoid a lot of string copying by changing the string by reference and not returning as a result of the function. This improves efficiency a lot.

Using a Static dictionary will remove the need to instantiate a Dictionary on each call.

Other

You should not restore the state of the application to 'On' as maybe it was intentionally off before running your macro. So, store state, turn things off and finally restore when done.

To make the main method reusable, you should pass the range from a higher level method call so that you can run your macro on other ranges as well.

No need for Option Compare Text as the Dictionary.CompareMode option takes care of text comparison for keys.

Solution

Run Main method below:

Option Explicit
Public Sub Main()
 Dim rng As Range
 '
 On Error Resume Next
 With ActiveSheet 'Or whatever worksheet
 Set rng = .Range("O2:O" & .Cells(Rows.Count, "O").End(xlUp).Row) 'Or whatever range
 End With
 On Error GoTo 0
 If rng Is Nothing Then Exit Sub 'Or display a message
 '
 RemoveDuplicateLinesFromRange rng
End Sub
Public Sub RemoveDuplicateLinesFromRange(ByVal rng As Range _
 , Optional ByVal delimiter As String = vbLf)
 If rng Is Nothing Then
 Err.Raise 91, , "Range not set"
 ElseIf rng.Areas.Count > 1 Then
 Err.Raise 5, , "Non-contigous range"
 End If
 '
 Dim xlCalc As XlCalculation: xlCalc = Application.Calculation
 Dim displayOn As Boolean: displayOn = Application.ScreenUpdating
 Dim eventsOn As Boolean: eventsOn = Application.EnableEvents
 '
 With Application
 If xlCalc <> xlCalculationManual Then .Calculation = xlCalculationManual
 If displayOn Then .ScreenUpdating = False
 If eventsOn Then .EnableEvents = False
 End With
 '
 Dim arr() As Variant
 Dim i As Long
 Dim j As Long
 '
 If rng.Count = 1 Then
 ReDim arr(1 To 1, 1 To 1)
 arr(1, 1) = rng.Value2
 Else
 arr = rng.Value2
 End If
 '
 If UBound(arr, 1) < UBound(arr, 2) Then
 For i = LBound(arr, 1) To UBound(arr, 1)
 For j = LBound(arr, 2) To UBound(arr, 2)
 RemoveDuplicateLines arr(i, j), delimiter
 Next j
 Next i
 Else
 For j = LBound(arr, 2) To UBound(arr, 2)
 For i = LBound(arr, 1) To UBound(arr, 1)
 RemoveDuplicateLines arr(i, j), delimiter
 Next i
 Next j
 End If
 rng.Value2 = arr
 '
 With Application
 If xlCalc <> xlCalculationManual Then .Calculation = xlCalc
 If displayOn Then .ScreenUpdating = True
 If eventsOn Then .EnableEvents = True
 End With
End Sub
Private Sub RemoveDuplicateLines(ByRef v As Variant _
 , Optional ByVal delimiter As String = vbLf)
 If VarType(v) <> vbString Then Exit Sub
 '
 Static dict As dictionary
 Dim parts() As String
 Dim i As Long
 Dim hasChanged As Boolean
 '
 If dict Is Nothing Then
 Set dict = New dictionary
 dict.CompareMode = vbTextCompare
 Else
 dict.RemoveAll
 End If
 '
 parts = Split(v, delimiter)
 If LBound(parts) = UBound(parts) Then
 v = Trim$(v)
 Exit Sub
 End If
 '
 For i = LBound(parts, 1) To UBound(parts, 1)
 If TrimIfNeeded(parts(i)) Then hasChanged = True
 dict(parts(i)) = Empty
 Next
 '
 If hasChanged Or (UBound(parts, 1) - LBound(parts, 1) + 1 > dict.Count) Then
 v = Join(dict.Keys, delimiter)
 End If
End Sub
Private Function TrimIfNeeded(ByRef Text As String) As Boolean
 Dim size As Long: size = Len(Text)
 If size = 0 Then Exit Function
 '
 Text = Trim$(Text)
 TrimIfNeeded = (size > Len(Text))
End Function

Final thoughts

You might want to check for formulas. When you read an entire range, you could have a combination of formulas and values so you might want to update code to exclude formula cells from the macro.

answered Aug 3, 2022 at 10:26
\$\endgroup\$
1
  • 2
    \$\begingroup\$ It works excellently πŸ‘, your code just took 0.4 second to finish, It’s like rocket speed. I will follow your notes and recommendations. \$\endgroup\$ Commented Aug 3, 2022 at 13:25

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.