I have 9 sheets with 1668 rows and 34 cols. The code collects all data from every sheet into one Summary sheet in the following format:
service 1 supplier 1 price etc.. service 1 supplier 2 price etc..
Unfortunately excel freezes around the 7k row and I receive 'out of memory' error. Any suggestions how to solve improve to code to run on large data?
Sub goEasy()
Dim wsText As Variant
Dim sht As Worksheet
Dim wSum As Worksheet
Dim service As String
Dim supplier As String
Dim priceRange As String
Dim price As String
Dim Lrow As Long, LastRow As Long
Dim a As Long, b As Long
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Application.EnableEvents = False
Set sht = ThisWorkbook.Worksheets(4)
Set wSum = ThisWorkbook.Worksheets("Summary")
wsText = Array("<25K", "25K <100K", "100K <250K", "250K <500K", "500K <1M", "1M <5M", "5M <15M", "15M <30M", "30M <50M")
LastRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
For Each element In wsText
'For i = 5 To LastRow
a = 4
b = 12
Do While a < LastRow
'For j = 13 To 47
If a = LastRow Then
a = 4
Exit Do
End If
a = a + 1
Do While b <= 47
If b = 47 Then
b = 12
Exit Do
End If
b = b + 1
Lrow = wSum.UsedRange.Rows(wSum.UsedRange.Rows.Count).Row + 1
service = ThisWorkbook.Worksheets(element).Cells(a, 1).Text
supplier = ThisWorkbook.Worksheets(element).Cells(4, b).Text
priceRange = ThisWorkbook.Worksheets(element).Cells(2, 1).Text
price = ThisWorkbook.Worksheets(element).Cells(a, b).Text
wSum.Cells(Lrow, 1) = service
wSum.Cells(Lrow, 2) = supplier
wSum.Cells(Lrow, 3) = priceRange
wSum.Cells(Lrow, 4) = price
'Next j
Loop
'Next i
Loop
Next element
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Application.EnableEvents = True
MsgBox ("Complete")
End Sub
-
4\$\begingroup\$ How does it freeze at 7K rows when you have 1,668 rows per sheet? Does it run to completion with a smaller data set? It's extremely hard to follow what your code is doing, with that thrice-nested loop, the insufficient indentation, confusing or irrelevant comments and the single-letter variable names - I'd suggest you try to improve the readability first, and then make changes to improve performance and memory footprint. Otherwise you'll be making changes and very likely introduce bugs. \$\endgroup\$Mathieu Guindon– Mathieu Guindon2016年08月16日 14:02:46 +00:00Commented Aug 16, 2016 at 14:02
-
\$\begingroup\$ @Mat'sMug 1st loop goes through the sheets. The second loop gets each row and the 3rd loop appends all the supplier + price... to the line. It just freezes after a few minutes and after closing I receive out of memory error. The single word variables are just iterators. \$\endgroup\$Lóránt Csaba Mihály– Lóránt Csaba Mihály2016年08月16日 14:07:06 +00:00Commented Aug 16, 2016 at 14:07
-
\$\begingroup\$ Can you show how the data looks? Is in tabular format with rows and column headers? I have a solution that will use no loops but I need to see how the data is laid out. \$\endgroup\$Parfait– Parfait2016年08月17日 02:09:13 +00:00Commented Aug 17, 2016 at 2:09
2 Answers 2
Naming
In order for code to be useful, it has to be understood. This applies equally to you now, you in 6 months and anybody else who eventually has to work with it. As such, code should be written for other people to read and understand.
Documentation helps with this. Writing down what your program is doing/why. But the easiest way is just to name things descriptively and unambiguously.
Dim service As String
Dim supplier As String
Dim priceRange As String
Dim price As String
These are good names. Somewhat ambiguous, I'd prefer something like serviceName, supplierName, priceText
etc. to be completely unambiguous, but I can look anywhere in your code, see those variables and know precisely what they are.
Dim wsText As Variant
Dim wSum As Worksheet
Dim Lrow As Long
Dim a As Long, b As Long
These are not good names.
If I see something called wsText
I'm going to parse that as Worksheet Text
which means... some kind of text, in a worksheet? A worksheet called text? Oh, it's a list of worksheet names.
Yeah, that was completely non-obvious.
Just call it worksheetNames
or maybe targetWorksheetNames
.
wSum
is similarly ambiguous and not-obvious about what it is. Just call it summarySheet
.
a, b
are generic, and hence useless. Here, they refer to Row
and Column
indexes, so why not call them currentRow, currentcolumn
?
Good naming just makes code a hell of a lot easier to work with. Like so:
Sub AggregateSheetData()
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Application.EnableEvents = False
Dim sheetNames As Variant
sheetNames = Array("<25K", "25K <100K", "100K <250K", "250K <500K", "500K <1M", "1M <5M", "5M <15M", "15M <30M", "30M <50M")
Dim summarySheet As Worksheet
Set summarySheet = ThisWorkbook.Worksheets("Summary")
Dim currentSummaryRow As Long
currentSummaryRow = summarySheet.Cells(summarySheet.Cells.Count, 1).End(xlUp).Row
Dim service As String
Dim supplier As String
Dim priceRange As String
Dim price As String
Dim currentSheet As Worksheet
Dim currentSheetName As String
Dim currentRow As Long, currentColumn As Long
Dim lastRow As Long
Dim sheetCounter As Long
For sheetCounter = LBound(sheetNames) To UBound(sheetNames)
currentSheetName = sheetNames(sheetCounter)
Set currentSheet = ThisWorkbook.Worksheets(currentSheetName)
lastRow = currentSheet.Cells(currentSheet.Rows.Count, 1).End(xlUp).Row
For currentRow = 5 To lastRow
currentSummaryRow = currentSummaryRow + 1
For currentColumn = 12 To 47
priceRange = currentSheet.Cells(2, 1).Text
service = currentSheet.Cells(currentRow, 1).Text
supplier = currentSheet.Cells(4, currentColumn).Text
price = currentSheet.Cells(currentRow, currentColumn).Text
summarySheet.Cells(currentSummaryRow, 1) = service
summarySheet.Cells(currentSummaryRow, 2) = supplier
summarySheet.Cells(currentSummaryRow, 3) = priceRange
summarySheet.Cells(currentSummaryRow, 4) = price
Next currentColumn
Next currentRow
Next sheetCounter
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Application.EnableEvents = True
MsgBox ("Complete")
End Sub
Now things are becoming a lot clearer, and we can move on to the next stage of making this code useful. Namely, explaining and documenting important information.
Magic Numbers
A magic number is any hard-coded value which appears in your code.
Why does a
start at 5?
Why does b
only go from 12 to 47?
Why is supplier
always pulled from row 5?
Why is service
always pulled from column 1?
Why are they laid out in columns 1-4 of the summary sheet, and in that order?
How do we know that our worksheets haven't been renamed?
For each of these questions, you should either re-structure your code so they don't have to be hard-coded, or you leave a note explaining why they have the values they have and put them in appropriate variables.
In this case, I recommend constants. Like so:
'/ Each sheet is laid out with Suppliers on row 4, Service in column 1, and then price values in a grid.
Const SUPPLIER_ROW As Long = 4
Const SERVICE_COLUMN As Long = 1
...
...
For currentRow = SUPPLIER_ROW + 1 to finalRow
For currentColumn = SERVICE_COLUMN + 1 to finalColumn
...
...
And now, if your data ever moves around, you only have to go and change that value in one place. And everywhere else in your code, you can refer to your constant by name, rather than trusting that you'll remember why the numbers are what they are.
Arrays
This is where we're going to give you a serious performance tune-up. I expect it will solve all of your speed and memory problems.
Doing anything to a Worksheet
is a huge operation. If you write
priceRange = currentSheet.Cells(2, 1).Text
it doesn't feel like a big operation, but you have to query the Worksheet
Object, which queries the Cells
object, which searches through a couple billion range objects to find the one you're after, which then gets queried to determine the text
value it's currently displaying. This also triggers Worksheet
Events (query handlers, events, calculations, screenUpdating, validation, etc.) which fire off their own chains of cascading events and so on and so forth.
And you're doing this roughly 2 Million times.
(N.B. the above is for illustration only. The actual chain of events is much more complex and convoluted).
When working with data, what you want is an Array
. An Array
is just a grid of values laid out in memory, so querying a value from an Array
is literally a trillion times faster than querying it from a Worksheet
object.
VBA makes this incredibly easy. Just create a Range
that encompasses all your data, and do the following:
Dim dataArray As Variant
dataArray = dataRange
And now, whatever was in the topLeft Cell of your Range is in dataArray(1, 1)
. Next row down dataArray(2, 1)
and so forth. you can then read this data back to a worksheet by doing the same in reverse:
pasteRange = dataArray
The following is your code, re-written to use Arrays, and it will be quite literally a thousand times faster:
Sub AggregateSheetData()
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Application.EnableEvents = False
'/ row/column positions on data sheets
Const SUPPLIER_ROW As Long = 4
Const SERVICE_COLUMN As Long = 1
Const START_COLUMN As Long = 12
Const SUMMARY_SHEET_NAME As String = "Summary"
Dim sheetNames As Variant
sheetNames = Array("<25K", "25K <100K", "100K <250K", "250K <500K", "500K <1M", "1M <5M", "5M <15M", "15M <30M", "30M <50M")
Dim summaryData As Variant
ReDim summaryData(1 To 4, 1 To 1)
Dim summaryCounter As Long
Dim service As String
Dim supplier As String
Dim priceRange As String
Dim price As String
Dim dataRange As Range
Dim sheetData As Variant
Dim finalRow As Long, finalColumn As Long
Dim iRow As Long, iColumn As Long
Dim currentSheet As Worksheet
Dim currentSheetName As String
Dim sheetCounter As Long
For sheetCounter = LBound(sheetNames) To UBound(sheetNames)
currentSheetName = sheetNames(sheetCounter)
Set currentSheet = ThisWorkbook.Worksheets(currentSheetName)
With currentSheet
priceRange = .Cells(2, 1).Text
finalRow = .Cells(.Rows.Count, SERVICE_COLUMN).End(xlUp).Row
finalColumn = .Cells(SUPPLIER_ROW, .Columns.Count).End(xlToLeft).Column
Set dataRange = .Range(.Cells(SUPPLIER_ROW, SERVICE_COLUMN), .Cells(finalRow, finalColumn))
End With
sheetData = dataRange
Dim LB1 As Long, UB1 As Long
Dim LB2 As Long, UB2 As Long
LB1 = LBound(sheetData, 1)
UB1 = UBound(sheetData, 1)
LB2 = LBound(sheetData, 2)
UB2 = UBound(sheetData, 2)
For iRow = LB1 To UB1
service = sheetData(iRow, LB2)
For iColumn = (START_COLUMN - SERVICE_COLUMN + 1) To UB2
supplier = sheetData(LB1, iColumn)
price = sheetData(iRow, iColumn)
summaryCounter = summaryCounter + 1
ReDim Preserve summaryData(1 To 4, 1 To summaryCounter)
summaryData(1, summaryCounter) = service
summaryData(2, summaryCounter) = supplier
summaryData(3, summaryCounter) = priceRange
summaryData(4, summaryCounter) = price
Next iColumn
Next iRow
Next sheetCounter
Dim summarySheet As Workbook
Set summarySheet = ThisWorkbook.Worksheets(SUMMARY_SHEET_NAME)
Dim pasteRange As Range
With summarySheet
finalRow = .Cells(.Rows.Count, 1).End(xlUp).Row
Set pasteRange = .Range(.Cells(finalRow + 1, 1), .Cells(finalRow + summaryCounter, 4))
pasteRange = summaryData
End With
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Application.EnableEvents = True
MsgBox ("Complete")
End Sub
-
\$\begingroup\$ Will it be literally 3 million times faster or literally 1000 times faster? ;) Solid answer and explanation. \$\endgroup\$Raystafarian– Raystafarian2016年08月16日 15:58:40 +00:00Commented Aug 16, 2016 at 15:58
-
\$\begingroup\$ If the code was *just* reading to/from arrays it'd be millions of times faster. But there *is* some overhead in there, so I think 1,000 is more realistic overall. \$\endgroup\$Kaz– Kaz2016年08月16日 16:00:39 +00:00Commented Aug 16, 2016 at 16:00
First, as a quick win, I would change
For Each element In wsText
into
For i = lbound(wsText) to Ubound(wsText)
or even better
For i = 1 to 10 'or whatever number of items you have there
I have read that looping through an array using a For each
construct is highly inefficient compared to a For next
loop.
Second, I would replace
service = ThisWorkbook.Worksheets(element).Cells(a, 1).Text
supplier = ThisWorkbook.Worksheets(element).Cells(4, b).Text
priceRange = ThisWorkbook.Worksheets(element).Cells(2, 1).Text
price = ThisWorkbook.Worksheets(element).Cells(a, b).Text
by
With ThisWorkbook.Worksheets(element)
service = .Cells(a, 1).Text
supplier = .Cells(4, b).Text
priceRange = .Cells(2, 1).Text
price = .Cells(a, b).Text
End with
which is more readable and quicker, since you only set the ref to the worksheet object once.
-
\$\begingroup\$ Correct. Arrays should be iterated with a
For
loop. It's also on Docs.SO =) \$\endgroup\$Mathieu Guindon– Mathieu Guindon2016年08月16日 14:17:31 +00:00Commented Aug 16, 2016 at 14:17 -
\$\begingroup\$ Thank you for the suggestions, unfortunately it still crashes around 7k. (not responding and out of memory). \$\endgroup\$Lóránt Csaba Mihály– Lóránt Csaba Mihály2016年08月16日 14:44:05 +00:00Commented Aug 16, 2016 at 14:44
-
\$\begingroup\$ @LórántCsabaMihály if you're looking to solve a specific problem with your code, you've come to the wrong place. On this site you will receive feedback on all aspects of your working code, to improve its efficiency, performance, readability, maintainability, etc. Specific programming issues belong on Stack Overflow - note that as it stands your question would probably be closed as too broad on SO, because your code isn't narrowed down on the issue; you can't ask SO "here's my code, where's my bug?", that's not how it works. And it's off-topic on Code Review, too. \$\endgroup\$Mathieu Guindon– Mathieu Guindon2016年08月16日 14:48:30 +00:00Commented Aug 16, 2016 at 14:48
-
\$\begingroup\$ @Mat'sMug I asked help on SO and they pointed here. I just need some help to solve this task. The code is working, the problem is that it can't handle big files. If you can code then help and stop saying that this topic is irrelevant. \$\endgroup\$Lóránt Csaba Mihály– Lóránt Csaba Mihály2016年08月16日 14:55:00 +00:00Commented Aug 16, 2016 at 14:55
-
2\$\begingroup\$ @LórántCsabaMihály I will appreciate that you remain civil please. I'm only telling you what this site is for, and warning you that you may not receive an answer that solves your OOM issue, because fixing bugs is not what we do here, that's Stack Overflow's territory - the only reason your question isn't closed yet is specifically because it appears to work with smaller data sets. Please read how to get the best value of Code Review and our help center to know what we're about. Stack Overflow users don't always know what's OK on CR. \$\endgroup\$Mathieu Guindon– Mathieu Guindon2016年08月16日 15:03:24 +00:00Commented Aug 16, 2016 at 15:03