3
\$\begingroup\$

INTRO

I am using an Excel worksheet as my "database." No headers so [F1] is assigned to column one by default. I filled the entire column, all 1,048,576 cells with RANDBETWEEN(1,20). I then hard set these values by copy/pasting as value.

It is very slow. Much slower than using Excel functions would be to generate same data. I realize that I could use 1,048,576 as a constant denominator but I wanted to practice with SQL query language, and keep model more dynamic.

EDIT: it should not say % in results not many times. I am getting the % of times each var occurred in my data set. I am basically seeing what the distribution is of randbetween(1-20) outputs are over X calls. In this instance x is 1,048,576.

My SQL Query

"SELECT Round(SUM(IIF([F1]=" & searchKey & ",1,0))*100.0/SUM(IIF([F1]<> Null,1,0)),10) From [Sheet1$];"

Is returning the total number of records in column F1 that are equal to searchKey and dividing that number by the total number of records in column F1.

CODE

Option Explicit
Private Declare PtrSafe Function timeGetTime Lib "winmm.dll" () As Long
Sub SqlQueryOnWorkSheet()
Dim started As Long
Dim cn As ADODB.Connection
Dim filePath As String
Dim counter As Long
Dim outCome As Double
Dim ended As Long
 started = timeGetTime
 filePath = "Z:\Test\Test1.xlsx"
 
 Set cn = EstablishConnection(filePath)
 
 If cn.State <> 1 Then GoTo CleanFail:
 For counter = 1 To 20
 outCome = FindCount(cn, counter)
 PrintOutcome counter, outCome
 Next counter
 cn.Close
 Set cn = Nothing
 
 ended = timeGetTime
 Debug.Print "QUERIES RAN IN " & (ended - started) / 1000 & " SECONDS"
 Exit Sub
CleanFail:
 Debug.Print "CONNECTION COULD NOT BE MADE"
End Sub
Function EstablishConnection(ByVal filePath As String) As ADODB.Connection
 Set EstablishConnection = New ADODB.Connection
 EstablishConnection.Open _
 "Provider=Microsoft.ACE.OLEDB.12.0;" & _
 "Data Source='" & filePath & "';" & _
 "Extended Properties=""Excel 12.0 Macro;HDR=No;IMEX=1;"";"
End Function
Function FindCount(ByRef cn As ADODB.Connection, ByVal searchKey As Long) As Double
Dim strSql As String
Dim rs As ADODB.Recordset
On Error GoTo CleanFail:
 Set rs = New ADODB.Recordset
 strSql = "SELECT Round(SUM(IIF([F1]=" & searchKey & ",1,0))*100.0/SUM(IIF([F1]<> Null,1,0)),10) From [Sheet1$];"
 rs.Open strSql, cn
 FindCount = rs.GetString
 rs.Close
 Set rs = Nothing
 Exit Function
CleanFail:
 Debug.Print "QUERY FAILED"
End Function
Sub PrintOutcome(ByVal counter As Long, ByVal outCome As Double)
 Debug.Print "Variable " & counter & " Occured " & outCome & " Many Times"
End Sub

Results

EDIT: THIS SHOULD BE % NOT MANY -- FIXED

Variable 1 Occured 4.9837112427 % Of Time
Variable 2 Occured 5.0171852112 % Of Time
Variable 3 Occured 4.9752235413 % Of Time
Variable 4 Occured 4.9716949463 % Of Time
Variable 5 Occured 5.0051689148 % Of Time
Variable 6 Occured 4.9989700317 % Of Time
Variable 7 Occured 4.9901008606 % Of Time
Variable 8 Occured 5.0283432007 % Of Time
Variable 9 Occured 5.0018310547 % Of Time
Variable 10 Occured 5.0164222717 % Of Time
Variable 11 Occured 4.9933433533 % Of Time
Variable 12 Occured 5.0059318542 % Of Time
Variable 13 Occured 5.0333976746 % Of Time
Variable 14 Occured 4.9952507019 % Of Time
Variable 15 Occured 5.0163269043 % Of Time
Variable 16 Occured 4.9654006958 % Of Time
Variable 17 Occured 4.9822807312 % Of Time
Variable 18 Occured 5.0310134888 % Of Time
Variable 19 Occured 5.0113677979 % Of Time
Variable 20 Occured 4.9770355225 % Of Time
QUERIES RAN IN 38.754 SECONDS
asked Aug 23, 2018 at 21:53
\$\endgroup\$
7
  • 1
    \$\begingroup\$ Guessing that since RANDBETWEEN is volatile, it's updating every single one of the 1,048,576 results every time the cursor moves for a read... \$\endgroup\$ Commented Aug 23, 2018 at 22:13
  • \$\begingroup\$ I don't see RANDBETWEEN in the code anywhere, so are you really testing it or are you performing a similar query to compare results? Please adjust your title to reflect what your code does. \$\endgroup\$ Commented Aug 23, 2018 at 23:00
  • \$\begingroup\$ @Raystafarian OK tried to give it a better title. Also adjusted name of primary subroutine....bad habits die hard!!! :-p \$\endgroup\$ Commented Aug 24, 2018 at 1:22
  • \$\begingroup\$ @Comintern hopefully more informative title more accurately represents what I am doing \$\endgroup\$ Commented Aug 24, 2018 at 1:23
  • \$\begingroup\$ My point is that if you have any cells with RANDBETWEEN in them, you should expect the performance to be poor. RANDBETWEEN is volatile function. That means every time you calculate the value of a cell with RANDBETWEEN in it, all of them recalculate. That means as the ADO cursor moves from cell to cell, you end up recalculating 1,048,576 cells. If you aggregate them in the query, you get a trillion cell calculations. That's why it's slow. \$\endgroup\$ Commented Aug 24, 2018 at 1:44

1 Answer 1

3
\$\begingroup\$

The reason for the poor performance is that you are taking a non database approach to compiling the data. A query's speed is dependent on its ability to group, filter and index records.

SELECT 
Round(SUM(IIF([F1]=" & searchKey & ",1,0))
*100.0
/SUM(IIF([F1]<>Null,1,0)),10)
From [Sheet1$];

The IIF() function is much slower that using a WHERE clause to filter the data. Use WHERE ([F1] Is Not Null) AND ([F1]=" & searchKey & ") so you are only retrieving the relevant records.

With the records properly filtered, you can use Count(*) instead of Sum() and IIF([F1]<>Null,1,0).

It would be better to group the counts and returning all the records at once.

Although it doesn't matter for 20 records, writing to the Immediate Window line by line is slow.

Before writing the code below, I copied the data into an Access table and ran the Query Wizard - Count Duplicates.

PrintOccurences

Sub PrintOccurences(ByRef cn As ADODB.Connection)
 Const strSql As String = "SELECT First([Sheet1$].[F1]) AS [F1 Field], Round((Count([Sheet1$].[F1])/ (SELECT Count(*) FROM [Sheet1$] WHERE [Sheet1$].[F1])) * 100,10) AS NumberOfDups" & vbNewLine & _
 "FROM [Sheet1$]" & vbNewLine & _
 "GROUP BY [Sheet1$].[F1]" & vbNewLine & _
 "HAVING (((First([Sheet1$].[F1])) Is Not Null));"
 Dim rs As ADODB.Recordset
 Dim sb As Object
 On Error GoTo CleanFail:
 Set rs = New ADODB.Recordset
 Set sb = CreateObject("System.Text.StringBuilder")
 rs.Open strSql, cn
 Do While Not rs.EOF
 sb.AppendFormat_4 "Variable {0} Occured {1} Many Times" & vbNewLine, Array(rs("F1 Field").Value, rs("NumberOfDups").Value)
 rs.MoveNext
 Loop
 Debug.Print sb.ToString
 rs.Close
 Set rs = Nothing
 Exit Sub
CleanFail:
 Debug.Print "QUERY FAILED"
End Sub

Results

Variable 1 Occurred 5.2052052052 Many Times
Variable 2 Occurred 3.9039039039 Many Times
Variable 3 Occurred 4.8048048048 Many Times
Variable 4 Occurred 5.2052052052 Many Times
Variable 5 Occurred 4.4044044044 Many Times
Variable 6 Occurred 5.3053053053 Many Times
Variable 7 Occurred 5.005005005 Many Times
Variable 8 Occurred 5.2052052052 Many Times
Variable 9 Occurred 5.5055055055 Many Times
Variable 10 Occurred 4.8048048048 Many Times
Variable 11 Occurred 5.4054054054 Many Times
Variable 12 Occurred 4.9049049049 Many Times
Variable 13 Occurred 4.6046046046 Many Times
Variable 14 Occurred 4.1041041041 Many Times
Variable 15 Occurred 6.6066066066 Many Times
Variable 16 Occurred 5.005005005 Many Times
Variable 17 Occurred 5.5055055055 Many Times
Variable 18 Occurred 4.4044044044 Many Times
Variable 19 Occurred 5.005005005 Many Times
Variable 20 Occurred 5.1051051051 Many Times
answered Aug 24, 2018 at 11:45
\$\endgroup\$
9
  • \$\begingroup\$ I am finding the % not how many times lol. Idk why I left it like that, just tired. I am finding what % of time a each number showed up. I explain in paragraph preceding code but I guess the many times threw you off (and rightfully so) \$\endgroup\$ Commented Aug 24, 2018 at 13:48
  • \$\begingroup\$ Your initial comments are much appreciated. It seems that I am using this as an expensive array rather than an ADO. \$\endgroup\$ Commented Aug 24, 2018 at 13:50
  • \$\begingroup\$ My SQL Query is returning the total number of records in column F1 that are equal to searchKey and dividing that number by the total number of records in column F1. I tried to use a Where clause but had hard time with it \$\endgroup\$ Commented Aug 24, 2018 at 13:57
  • \$\begingroup\$ @learnAsWeGo I modified my code to return the percentage. I will clean up my post later. \$\endgroup\$ Commented Aug 24, 2018 at 14:50
  • \$\begingroup\$ Basically if i want this to be fast I am going to have to do the indexing / sorting / filtering myself then run query? \$\endgroup\$ Commented Sep 25, 2018 at 3:15

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.