Running SQL Queries On A Worksheet (With Set Of Random Numbers Between 1-20 Filling Column A) Handled As An ADO

Question 1

INTRO

I am using an Excel worksheet as my "database." No headers so [F1] is assigned to column one by default. I filled the entire column, all 1,048,576 cells with RANDBETWEEN(1,20). I then hard set these values by copy/pasting as value.

It is very slow. Much slower than using Excel functions would be to generate same data. I realize that I could use 1,048,576 as a constant denominator but I wanted to practice with SQL query language, and keep model more dynamic.

EDIT: it should not say % in results not many times. I am getting the % of times each var occurred in my data set. I am basically seeing what the distribution is of randbetween(1-20) outputs are over X calls. In this instance x is 1,048,576.

My SQL Query

"SELECT Round(SUM(IIF([F1]=" & searchKey & ",1,0))*100.0/SUM(IIF([F1]<> Null,1,0)),10) From [Sheet1$];"

Is returning the total number of records in column F1 that are equal to searchKey and dividing that number by the total number of records in column F1.

CODE

Option Explicit
Private Declare PtrSafe Function timeGetTime Lib "winmm.dll" () As Long
Sub SqlQueryOnWorkSheet()
Dim started As Long
Dim cn As ADODB.Connection
Dim filePath As String
Dim counter As Long
Dim outCome As Double
Dim ended As Long
 started = timeGetTime
 filePath = "Z:\Test\Test1.xlsx"
 
 Set cn = EstablishConnection(filePath)
 
 If cn.State <> 1 Then GoTo CleanFail:
 For counter = 1 To 20
 outCome = FindCount(cn, counter)
 PrintOutcome counter, outCome
 Next counter
 cn.Close
 Set cn = Nothing
 
 ended = timeGetTime
 Debug.Print "QUERIES RAN IN " & (ended - started) / 1000 & " SECONDS"
 Exit Sub
CleanFail:
 Debug.Print "CONNECTION COULD NOT BE MADE"
End Sub
Function EstablishConnection(ByVal filePath As String) As ADODB.Connection
 Set EstablishConnection = New ADODB.Connection
 EstablishConnection.Open _
 "Provider=Microsoft.ACE.OLEDB.12.0;" & _
 "Data Source='" & filePath & "';" & _
 "Extended Properties=""Excel 12.0 Macro;HDR=No;IMEX=1;"";"
End Function
Function FindCount(ByRef cn As ADODB.Connection, ByVal searchKey As Long) As Double
Dim strSql As String
Dim rs As ADODB.Recordset
On Error GoTo CleanFail:
 Set rs = New ADODB.Recordset
 strSql = "SELECT Round(SUM(IIF([F1]=" & searchKey & ",1,0))*100.0/SUM(IIF([F1]<> Null,1,0)),10) From [Sheet1$];"
 rs.Open strSql, cn
 FindCount = rs.GetString
 rs.Close
 Set rs = Nothing
 Exit Function
CleanFail:
 Debug.Print "QUERY FAILED"
End Function
Sub PrintOutcome(ByVal counter As Long, ByVal outCome As Double)
 Debug.Print "Variable " & counter & " Occured " & outCome & " Many Times"
End Sub

Results

EDIT: THIS SHOULD BE % NOT MANY -- FIXED

Variable 1 Occured 4.9837112427 % Of Time
Variable 2 Occured 5.0171852112 % Of Time
Variable 3 Occured 4.9752235413 % Of Time
Variable 4 Occured 4.9716949463 % Of Time
Variable 5 Occured 5.0051689148 % Of Time
Variable 6 Occured 4.9989700317 % Of Time
Variable 7 Occured 4.9901008606 % Of Time
Variable 8 Occured 5.0283432007 % Of Time
Variable 9 Occured 5.0018310547 % Of Time
Variable 10 Occured 5.0164222717 % Of Time
Variable 11 Occured 4.9933433533 % Of Time
Variable 12 Occured 5.0059318542 % Of Time
Variable 13 Occured 5.0333976746 % Of Time
Variable 14 Occured 4.9952507019 % Of Time
Variable 15 Occured 5.0163269043 % Of Time
Variable 16 Occured 4.9654006958 % Of Time
Variable 17 Occured 4.9822807312 % Of Time
Variable 18 Occured 5.0310134888 % Of Time
Variable 19 Occured 5.0113677979 % Of Time
Variable 20 Occured 4.9770355225 % Of Time
QUERIES RAN IN 38.754 SECONDS

Question 2

Guessing that since RANDBETWEEN is volatile, it's updating every single one of the 1,048,576 results every time the cursor moves for a read...

Question 3

I don't see RANDBETWEEN in the code anywhere, so are you really testing it or are you performing a similar query to compare results? Please adjust your title to reflect what your code does.

Question 4

@Raystafarian OK tried to give it a better title. Also adjusted name of primary subroutine....bad habits die hard!!! :-p

Question 5

@Comintern hopefully more informative title more accurately represents what I am doing

Question 6

My point is that if you have any cells with RANDBETWEEN in them, you should expect the performance to be poor. RANDBETWEEN is volatile function. That means every time you calculate the value of a cell with RANDBETWEEN in it, all of them recalculate. That means as the ADO cursor moves from cell to cell, you end up recalculating 1,048,576 cells. If you aggregate them in the query, you get a trillion cell calculations. That's why it's slow.

Question 7

The reason for the poor performance is that you are taking a non database approach to compiling the data. A query's speed is dependent on its ability to group, filter and index records.

SELECT 
Round(SUM(IIF([F1]=" & searchKey & ",1,0))
*100.0
/SUM(IIF([F1]<>Null,1,0)),10)
From [Sheet1$];

The IIF() function is much slower that using a WHERE clause to filter the data. Use WHERE ([F1] Is Not Null) AND ([F1]=" & searchKey & ") so you are only retrieving the relevant records.

With the records properly filtered, you can use Count(*) instead of Sum() and IIF([F1]<>Null,1,0).

It would be better to group the counts and returning all the records at once.

Although it doesn't matter for 20 records, writing to the Immediate Window line by line is slow.

Before writing the code below, I copied the data into an Access table and ran the Query Wizard - Count Duplicates.

PrintOccurences

Sub PrintOccurences(ByRef cn As ADODB.Connection)
 Const strSql As String = "SELECT First([Sheet1$].[F1]) AS [F1 Field], Round((Count([Sheet1$].[F1])/ (SELECT Count(*) FROM [Sheet1$] WHERE [Sheet1$].[F1])) * 100,10) AS NumberOfDups" & vbNewLine & _
 "FROM [Sheet1$]" & vbNewLine & _
 "GROUP BY [Sheet1$].[F1]" & vbNewLine & _
 "HAVING (((First([Sheet1$].[F1])) Is Not Null));"
 Dim rs As ADODB.Recordset
 Dim sb As Object
 On Error GoTo CleanFail:
 Set rs = New ADODB.Recordset
 Set sb = CreateObject("System.Text.StringBuilder")
 rs.Open strSql, cn
 Do While Not rs.EOF
 sb.AppendFormat_4 "Variable {0} Occured {1} Many Times" & vbNewLine, Array(rs("F1 Field").Value, rs("NumberOfDups").Value)
 rs.MoveNext
 Loop
 Debug.Print sb.ToString
 rs.Close
 Set rs = Nothing
 Exit Sub
CleanFail:
 Debug.Print "QUERY FAILED"
End Sub

Results

Variable 1 Occurred 5.2052052052 Many Times
Variable 2 Occurred 3.9039039039 Many Times
Variable 3 Occurred 4.8048048048 Many Times
Variable 4 Occurred 5.2052052052 Many Times
Variable 5 Occurred 4.4044044044 Many Times
Variable 6 Occurred 5.3053053053 Many Times
Variable 7 Occurred 5.005005005 Many Times
Variable 8 Occurred 5.2052052052 Many Times
Variable 9 Occurred 5.5055055055 Many Times
Variable 10 Occurred 4.8048048048 Many Times
Variable 11 Occurred 5.4054054054 Many Times
Variable 12 Occurred 4.9049049049 Many Times
Variable 13 Occurred 4.6046046046 Many Times
Variable 14 Occurred 4.1041041041 Many Times
Variable 15 Occurred 6.6066066066 Many Times
Variable 16 Occurred 5.005005005 Many Times
Variable 17 Occurred 5.5055055055 Many Times
Variable 18 Occurred 4.4044044044 Many Times
Variable 19 Occurred 5.005005005 Many Times
Variable 20 Occurred 5.1051051051 Many Times

Question 8

I am finding the % not how many times lol. Idk why I left it like that, just tired. I am finding what % of time a each number showed up. I explain in paragraph preceding code but I guess the many times threw you off (and rightfully so)

Question 9

Your initial comments are much appreciated. It seems that I am using this as an expensive array rather than an ADO.

Question 10

My SQL Query is returning the total number of records in column F1 that are equal to searchKey and dividing that number by the total number of records in column F1. I tried to use a Where clause but had hard time with it

Question 11

@learnAsWeGo I modified my code to return the percentage. I will clean up my post later.

Question 12

Basically if i want this to be fast I am going to have to do the indexing / sorting / filtering myself then run query?

TinMan TinMan 4,3181 gold badge13 silver badges25 bronze badges · Accepted Answer · 2018-08-24 11:45:21Z

The reason for the poor performance is that you are taking a non database approach to compiling the data. A query's speed is dependent on its ability to group, filter and index records.

SELECT 
Round(SUM(IIF([F1]=" & searchKey & ",1,0))
*100.0
/SUM(IIF([F1]<>Null,1,0)),10)
From [Sheet1$];

The IIF() function is much slower that using a WHERE clause to filter the data. Use WHERE ([F1] Is Not Null) AND ([F1]=" & searchKey & ") so you are only retrieving the relevant records.

With the records properly filtered, you can use Count(*) instead of Sum() and IIF([F1]<>Null,1,0).

It would be better to group the counts and returning all the records at once.

Although it doesn't matter for 20 records, writing to the Immediate Window line by line is slow.

Before writing the code below, I copied the data into an Access table and ran the Query Wizard - Count Duplicates.

PrintOccurences

Sub PrintOccurences(ByRef cn As ADODB.Connection)
 Const strSql As String = "SELECT First([Sheet1$].[F1]) AS [F1 Field], Round((Count([Sheet1$].[F1])/ (SELECT Count(*) FROM [Sheet1$] WHERE [Sheet1$].[F1])) * 100,10) AS NumberOfDups" & vbNewLine & _
 "FROM [Sheet1$]" & vbNewLine & _
 "GROUP BY [Sheet1$].[F1]" & vbNewLine & _
 "HAVING (((First([Sheet1$].[F1])) Is Not Null));"
 Dim rs As ADODB.Recordset
 Dim sb As Object
 On Error GoTo CleanFail:
 Set rs = New ADODB.Recordset
 Set sb = CreateObject("System.Text.StringBuilder")
 rs.Open strSql, cn
 Do While Not rs.EOF
 sb.AppendFormat_4 "Variable {0} Occured {1} Many Times" & vbNewLine, Array(rs("F1 Field").Value, rs("NumberOfDups").Value)
 rs.MoveNext
 Loop
 Debug.Print sb.ToString
 rs.Close
 Set rs = Nothing
 Exit Sub
CleanFail:
 Debug.Print "QUERY FAILED"
End Sub

Results

Variable 1 Occurred 5.2052052052 Many Times
Variable 2 Occurred 3.9039039039 Many Times
Variable 3 Occurred 4.8048048048 Many Times
Variable 4 Occurred 5.2052052052 Many Times
Variable 5 Occurred 4.4044044044 Many Times
Variable 6 Occurred 5.3053053053 Many Times
Variable 7 Occurred 5.005005005 Many Times
Variable 8 Occurred 5.2052052052 Many Times
Variable 9 Occurred 5.5055055055 Many Times
Variable 10 Occurred 4.8048048048 Many Times
Variable 11 Occurred 5.4054054054 Many Times
Variable 12 Occurred 4.9049049049 Many Times
Variable 13 Occurred 4.6046046046 Many Times
Variable 14 Occurred 4.1041041041 Many Times
Variable 15 Occurred 6.6066066066 Many Times
Variable 16 Occurred 5.005005005 Many Times
Variable 17 Occurred 5.5055055055 Many Times
Variable 18 Occurred 4.4044044044 Many Times
Variable 19 Occurred 5.005005005 Many Times
Variable 20 Occurred 5.1051051051 Many Times

I am finding the % not how many times lol. Idk why I left it like that, just tired. I am finding what % of time a each number showed up. I explain in paragraph preceding code but I guess the many times threw you off (and rightfully so)
Your initial comments are much appreciated. It seems that I am using this as an expensive array rather than an ADO.
My SQL Query is returning the total number of records in column F1 that are equal to searchKey and dividing that number by the total number of records in column F1. I tried to use a Where clause but had hard time with it
@learnAsWeGo I modified my code to return the percentage. I will clean up my post later.
Basically if i want this to be fast I am going to have to do the indexing / sorting / filtering myself then run query?

Stack Exchange Network

Running SQL Queries On A Worksheet (With Set Of Random Numbers Between 1-20 Filling Column A) Handled As An ADO

INTRO

CODE

Results

1 Answer 1

PrintOccurences

Results

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Running SQL Queries On A Worksheet (With Set Of Random Numbers Between 1-20 Filling Column A) Handled As An ADO

INTRO

CODE

Results

1 Answer 1

PrintOccurences

Results

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions