This is my code I made that extracts data from a .csv file stored on the web. My sheet called 'New Data' stores this data for 2 days. I store 2 days worth of data so that when I run it next time, it pulls the data from the web, compares to the data I pulled yesterday, and then places any of the data that doesn't match (wasn't there yesterday but is now there today) into a new column.
I essentially only actually need the new data, but am stuck storing 2 day's worth of about 4000 lines of data each. My file is extremely large and takes way too long to run (this has alot to do with my poor vba code). If someone could please help me cleanup my code (i.e. remove any unneccessary lines I know I have in there) I would be very greatful!
Option Explicit
Sub FindNewIssues()
'for efficiency
Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
'first checks to see if the data is already up-to-date
'continues with further commands only if data NOT up-to-date
If Worksheets("New Data").Range("A10") <> Worksheets("New Data").Range("B2") Then
'copy the old content to a new location
'this will allow us to compare the data from last night (new data) to two days ago (old data)
Sheets("New Data").Range("A9:I10000").Cut Destination:=Sheets("New Data").Range("K9")
'then import the new data from external database
With Worksheets("New Data").QueryTables.Add(Connection:= _
"TEXT;https://(the URL that I can't show).csv", Destination:=Range( _
"$A9ドル"))
.Name = "Closes"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.RefreshStyle = xlInsertDeleteCells
.SavePassword = True
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.TextFilePromptOnRefresh = False
.TextFilePlatform = 437
.TextFileStartRow = 1
.TextFileParseType = xlDelimited
.TextFileTextQualifier = xlTextQualifierDoubleQuote
.TextFileConsecutiveDelimiter = False
.TextFileTabDelimiter = False
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = True
.TextFileSpaceDelimiter = False
.TextFileColumnDataTypes = Array(3, 2, 2, 2, 2, 1, 1, 1, 1)
.TextFileTrailingMinusNumbers = True
.Refresh BackgroundQuery:=False
End With
Application.Calculate
'next need to check to see if there are any new issues
'this is done by looking for errors (non-matches) from a vlookup between the new and old data
Range("T10:W10000").ClearContents
Range("T10").Formula = "=IF(IF(ISERROR(VLOOKUP('New Data'!E10,'New Data'! $O10ドル:$S10000,1,ドルFALSE))=TRUE,'New Data'!E10,"""")=0,"""",IF(ISERROR (VLOOKUP('New Data'!E10,'New Data'!$O10ドル:$S10000,1,ドルFALSE))=TRUE,'New Data'! E10,""""))"
Range("U10").Formula = "=IF(IF(ISERROR(VLOOKUP('New Data'!B10,'New Data'! $L10ドル:$S10000,1,ドルFALSE))=TRUE,'New Data'!B10,"""")=0,"""",IF(ISERROR (VLOOKUP('New Data'!B10,'New Data'!$L10ドル:$S10000,1,ドルFALSE))=TRUE,'New Data'! B10,""""))"
Range("T10:U10").AutoFill Destination:=Range("T10:U10000"), Type:=xlFillDefault
Application.Calculate
'these will get placed in a new column and considered our new issues (because they didn't exist before)
Range("T9:U10000").Select
Selection.AutoFilter
ActiveSheet.Range("$T9ドル:$U10000ドル").AutoFilter Field:=1, Criteria1:="<>"
Worksheets("New Data").Range("T10:U10", Sheets("New Data").Range ("T10").End(xlDown)).Copy
Range("V10").Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=True, Transpose:=False
Selection.AutoFilter
Application.Calculate
'this if statement is just so I can store the new issues as text or else they'd clear every day
'this also takes into account the chance of there not being any new issues
If Worksheets("New Data").Range("V10").Value = "" Then
Range("V10").Value = ""
Else
If Worksheets("New Data").Range("X10") = "" Then
Worksheets("New Data").Range("V10:W10", Sheets("New Data").Range ("V10").End(xlDown)).Copy
Range("X10").Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=True, Transpose:=False
Else
Worksheets("New Data").Range("V10:W10", Sheets("New Data").Range ("V10").End(xlDown)).Copy
Range("X9").End(xlDown).Offset(1).Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=True, Transpose:=False
End If
End If
Application.CutCopyMode = False
'need to send these new issues to their appropriate sheets
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
End If
End Sub
3 Answers 3
I actually had to go and check to see how this line would behave. It does in fact work, but it's terribly confusing perhaps more so for a seasoned VB6 dev.
If Worksheets("New Data").Range("A10") <> Worksheets("New Data").Range("B2") Then
It looks like it's comparing Range
objects, but what it's really doing is comparing their values. I assume this is because Value
is Range
's default property the language is performing some "nice" cast for us under the hood. It's best to be explicit here so that no future maintainer wastes their time being confused by this.
If Worksheets("New Data").Range("A10").Value <> Worksheets("New Data").Range("B2").Value Then
Which brings me to a micro-optimization. Instead of accessing the Worksheets
collection repeatedly, store the "New Data" sheet in a variable and work with it instead.
Dim newData As Worksheet
Set newData = newData
If newData.Range("A10").Value <> newData.Range("B2").Value Then
It won't buy you much in the way of runtime, but it does make the code cleaner and easier to maintain.
newData.Range("A9:I10000").Cut Destination:=newData.Range("K9")
Are you absolutely 100% positive that the data will never exceed 10k rows? I wouldn't bet my paycheck on it. Find the last used row instead.
With newData.QueryTables.Add(Connection:= _ "TEXT;https://(the URL that I can't show).csv", Destination:=Range( _ "$A9ドル"))
That is an extremely odd way to break lines. I would expect it to look like this.
With newData.QueryTables.Add(Connection:= "TEXT;https://(the URL that I can't show).csv", _
Destination:=Range("$A9ドル"))
Or this. Pick your poison.
With newData.QueryTables.Add( _
Connection:="TEXT;https://(the URL that I can't show).csv", _
Destination:=Range("$A9ドル"))
I like the second option because everything lines up nice without having to fuss about with 5 or 6 tabs.
This looks like it came directly from the macro recorder.
With newData.QueryTables.Add( _ Connection:="TEXT;https://(the URL that I can't show).csv", _ Destination:=Range("$A9ドル")) .Name = "Closes" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .RefreshStyle = xlInsertDeleteCells .SavePassword = True .SaveData = True .AdjustColumnWidth = True .RefreshPeriod = 0 .TextFilePromptOnRefresh = False .TextFilePlatform = 437 .TextFileStartRow = 1 .TextFileParseType = xlDelimited .TextFileTextQualifier = xlTextQualifierDoubleQuote .TextFileConsecutiveDelimiter = False .TextFileTabDelimiter = False .TextFileSemicolonDelimiter = False .TextFileCommaDelimiter = True .TextFileSpaceDelimiter = False .TextFileColumnDataTypes = Array(3, 2, 2, 2, 2, 1, 1, 1, 1) .TextFileTrailingMinusNumbers = True .Refresh BackgroundQuery:=False End With
You can either spend some time figuring out how many of these are default values, and remove them, or extract the whole block into a procedure, but these are details we don't care about at the level of abstraction FindNewIssues
is at.
Application.Calculate 'next need to check to see if there are any new issues 'this is done by looking for errors (non-matches) from a vlookup between the new and old data Range("T10:W10000").ClearContents Range("T10").Formula = "=IF(IF(ISERROR(VLOOKUP('New Data'!E10,'New Data'!$O10ドル:$S10000,1,ドルFALSE))=TRUE,'New Data'!E10,"""")=0,"""",IF(ISERROR(VLOOKUP('New Data'!E10,'New Data'!$O10ドル:$S10000,1,ドルFALSE))=TRUE,'New Data'!E10,""""))" Range("U10").Formula = "=IF(IF(ISERROR(VLOOKUP('New Data'!B10,'New Data'!$L10ドル:$S10000,1,ドルFALSE))=TRUE,'New Data'!B10,"""")=0,"""",IF(ISERROR(VLOOKUP('New Data'!B10,'New Data'!$L10ドル:$S10000,1,ドルFALSE))=TRUE,'New Data'!B10,""""))" Range("T10:U10").AutoFill Destination:=Range("T10:U10000"), Type:=xlFillDefault Application.Calculate
You populate the data from the web query, calculate, clear the range of cells that have the VLoopkup, then put the same formulas back into the range, and then calculate again. This is a pretty expensive operation and I'm not convinced that you need to do anything more than Application.Calculate
here. I could be wrong (wouldn't be the first time), but you should try seeing what happens if you just recalculate here. It could save a significant amount of time.
End If End If Application.CutCopyMode = False 'need to send these new issues to their appropriate sheets Application.Calculation = xlCalculationAutomatic Application.ScreenUpdating = True End If
This is bad. You probably didn't notice it because the indentation was off, and this whole method is too long, but you're not always setting Calculation back to automatic and ScreenUpdating back on. Just moving it back outside of the If
block isn't good enough either. You need an error handler to ensure these always get put back no matter what happens.
I'm sorry that I didn't offer much in the way of optimizing your code, but the vast majority of the time here is likely spent querying the remote data source. There's not much you can do about that, but make the network faster. Your real opportunity for improvement here is to make sure you're only re-calculating the workbook when you really have to. I think you're probably doing that too often right now.
-
\$\begingroup\$ Did you mean to say
Set newData = Worksheets("New Data")
instead ofSet newData = newData
? \$\endgroup\$JK01– JK012015年06月25日 23:58:14 +00:00Commented Jun 25, 2015 at 23:58 -
\$\begingroup\$ Probably a Ctrl + H error @JK01. Yes. \$\endgroup\$RubberDuck– RubberDuck2015年06月26日 00:01:24 +00:00Commented Jun 26, 2015 at 0:01
It is good to see that you are using Option Explicit
and that you are controlling ScreenUpdating
and Calculation
.
In addition to the answer given by @RubberDuck, I'd like to point out a couple of things.
You only expect 4,000 or so lines of data but are currently filling 10,000 rows with formulas (presumably as a safeguard) and clearing and copying that chunk of data. You are more than doubling (4,000 vs 10,000) the calculation effort required in Excel. Working out the exact number of lines of data and using that size range will let you remove the overhead.
Also, look at your formulas.
Range("T10").Formula = "=IF(IF(ISERROR(VLOOKUP('New Data'!E10,'New Data'! $O10ドル:$S10000,1,ドルFALSE))=TRUE,'New Data'!E10,"""")=0,"""",IF(ISERROR (VLOOKUP('New Data'!E10,'New Data'!$O10ドル:$S10000,1,ドルFALSE))=TRUE,'New Data'! E10,""""))" Range("U10").Formula = "=IF(IF(ISERROR(VLOOKUP('New Data'!B10,'New Data'! $L10ドル:$S10000,1,ドルFALSE))=TRUE,'New Data'!B10,"""")=0,"""",IF(ISERROR (VLOOKUP('New Data'!B10,'New Data'!$L10ドル:$S10000,1,ドルFALSE))=TRUE,'New Data'! B10,""""))"
You have got what looks like the same VLOOKUP
function call twice on each line wrapped up in an ISERROR
and nested IF
functions. Consider replacing the IF(ISERROR(VLOOKUP
with the simpler IFERROR(VLOOKUP
if your version of Excel supports it. The VLOOKUP
only looks at the first column, so why not replace with the MATCH
function. If you forget about the ISERROR
completely, you could just use the MATCH
function and then change the AutoFilter
criteria to be Criteria1:="#N/A"
. Your formulas could (if I've understood) become:
Range("T10").Formula = "=MATCH('New Data'!E10,'New Data'!$O10ドル:$O$" & lastRowNum & ",0)
Range("U10").Formula = "=MATCH('New Data'!B10,'New Data'!$L10ドル:$L$" & lastRowNum & ",0)
Where lastRowNum
is the variable that stores the last row that contains values. You would then apply the autofilter using the new criteria and then copy columns B and E.
You might want to look at directly assigning the values from one range to another (dataSheet.Range("X10:X" & lastRowNum).Value = dataSheet.Range("V10:V" & lastRowNum).Value
) which will be quicker than Copy ... PasteSpecial Paste:=xlPasteValues
.
An alternative which may, or may not, be quicker would be to keep the old and new data on separate worksheets and use ADODB and SQL to build an OUTER JOIN
which would create a recordset of those values in the new data that are not in the old data. These links are from @RubberDuck's answer to another review. Here is the MSDN page on querying the Excel data source with ADODB. Once you have a recordset, you can use the CopyFromRecordset method of Range to paste into the current workbook.
My first suggestion would be to change your VLOOKUP
lines into For
loops and compare the values using full VBA instead of the Excel formulas.
Also, keep in mind that error catching in any language induces a lot of load in your program. ISERROR
, Try... Catch
, or On Error Resume Next
type statements should be avoided if possible and replaced with logic such as If...
.
I often reference this page when I want to speed my programs up. The site is for .NET and not VBA, but the same concepts can be applied.
Ideally, the more sort of "generic" commands (For
..., Do
..., If
..., While
..., etc.) are going to be much faster than the canned routines within Excel.
-
1\$\begingroup\$ I thought that the intrinsic Excel functions (
VLOOKUP
, etc) were written in highly optimised C and so are much faster than VBA loops. Have you found the opposite to be true? \$\endgroup\$ChipsLetten– ChipsLetten2015年06月25日 21:05:23 +00:00Commented Jun 25, 2015 at 21:05 -
\$\begingroup\$ That may be the case for
VLOOKUP
, but my concern is with the error handler ofISERROR
. I know that error handlers add a some load to your operation and if you're iterating through about 10,000 records, it can add up fast. \$\endgroup\$tbm0115– tbm01152015年06月25日 21:46:39 +00:00Commented Jun 25, 2015 at 21:46