I wrote some code to extract the information from a table, but it takes an extremely long time.
The table is in the format of a calendar. I need the information on an Excel sheet with column 1 as the day number and column 2 as the accommodation type. On the accommodation type, I only want the "Camp" types and the number of available units to show. I’m sure there will be a better way to loop through the table. Any help will be appreciated.
Here is my code:
Dim driver As New WebDriver, i As Integer, mysheet As Worksheet
Dim ele As WebElement
Set driver = New EdgeDriver
Application.ScreenUpdating = False
driver.Start "edge"
driver.Get "http://www.sanparks.org/reservations/accommodation/calendar-month/park/26/camp/41/date/2022-07-01"
Application.Wait Now + TimeValue("00:00:05")
Set mysheet = Sheets("Sheet1")
i = 7
eRow = Sheet1.Cells(Rows.Count, 3).End(xlUp).Offset(7, 0).Row
For Each ele In driver.FindElementsByClass("day")
If ele.FindElementByCss("span").Text <> "" Then
mysheet.Cells(i, 2).Value = ele.FindElementByCss("span").Text
On Error Resume Next
If Mid(ele.FindElementByClass("bold-name").Text, 1, 4) = "Camp" Then 'AccommodationType
mysheet.Cells(i, 3).Value = ele.FindElementByClass("bold-name").Text & " " & ele.FindElementByClass("unit-numbers").Text
End If
On Error GoTo 0
i = i + 1
End If
Next ele
-
\$\begingroup\$ Related: Scraping table contents from a webpage using vba with selenium \$\endgroup\$Greedo– Greedo2022年03月07日 11:07:23 +00:00Commented Mar 7, 2022 at 11:07
2 Answers 2
If you aren't locked into using Selenium, you can accomplish something similar to this using a web request. Web Requests should be faster as they don't need to worry about rendering any items on screen, they just return data.
I looked at the web traffic, and noticed this site uses an API to provide the data to the front end. I'm using this API, to return the data as JSON, parsing that, then returning that to an Excel range. This takes about 2 seconds on my machine.
In order to get this to work, you'll need a copy of this --> https://github.com/VBA-tools/VBA-JSON in your project. Follow the directions on the project page to get that setup.
Option Explicit
Public Sub ScrapeCampsites()
Dim URL As String
URL = "https://www.sanparks.org/includes/SANParksApp/API/v1/bookings/accommodation/getAvailabilityAccommodationMonthList.php?resort=10&month=7&year=2022"
Dim response As String
With CreateObject("MSXML2.XMLHTTP.6.0")
.Open "GET", URL, False
.send
response = .responseText
End With
'Include https://github.com/VBA-tools/VBA-JSON into your project
Dim Json As Object: Set Json = JsonConverter.ParseJson(response)
Dim Accomodations As Object: Set Accomodations = Json("DATA")
Dim Accomodation As Variant
Dim Availabilities As Variant
Dim Availability As Variant
Dim Results As Variant
ReDim Results(1 To 2, 1 To 50000)
Dim i As Long
For Each Accomodation In Accomodations
If Mid$(Accomodation("accommodationtypedesc"), 1, 4) = "Camp" Then
Availabilities = Accomodation("availabilities").Items
For Each Availability In Availabilities
i = i + 1
Results(1, i) = Availability("availableDate")
Results(2, i) = Availability("available")
Next
End If
Next
ReDim Preserve Results(1 To 2, i)
Dim mySheet As Worksheet
Set mySheet = Sheets("Sheet1")
mySheet.Range("A1:B1").Value = Array("Date", "Available")
mySheet.Range("A2:B" & UBound(Results, 2) + 1) = Application.WorksheetFunction.Transpose(Results)
End Sub
-
\$\begingroup\$ Out of interest, how did you inspect the traffic, what was the process? \$\endgroup\$Greedo– Greedo2022年03月07日 22:04:45 +00:00Commented Mar 7, 2022 at 22:04
-
\$\begingroup\$ I used the network tab in Chrome to via the web requests back and forth on the page. I noticed one of the endpoints had
API
in the URI, so I took a peek at what was included in the response. It was the JSON that feeds into the calendar on the page. \$\endgroup\$Ryan Wildry– Ryan Wildry2022年03月08日 13:32:52 +00:00Commented Mar 8, 2022 at 13:32 -
\$\begingroup\$ Greedo - I had my vba code checking the availability on IE and not through a webdriver. This took about 2 seconds to complete. \$\endgroup\$Nelj– Nelj2022年03月11日 04:50:09 +00:00Commented Mar 11, 2022 at 4:50
-
\$\begingroup\$ Ryan Wildry - Thank you Ryan for having a look. I have followed the instructions and got the following error: KeyNotFoundError - Dictionary key not found: STATUS The error is on the following line: "json_ParseObject.Item(json_Key) = json_ParseValue(json_String, json_Index)" \$\endgroup\$Nelj– Nelj2022年03月11日 04:54:18 +00:00Commented Mar 11, 2022 at 4:54
-
\$\begingroup\$ @Nelj This is a weird error, I don't see STATUS listed in the keys for code above. It might be related to requiring a reference to
Microsoft Scripting Runtime
the json parser uses this data structure to deserialize the data. \$\endgroup\$Ryan Wildry– Ryan Wildry2022年03月11日 13:57:47 +00:00Commented Mar 11, 2022 at 13:57
I notice a couple of things:
I found reference that says "Webdriver
Get
will wait until the page has fully loaded before returning the control" so we don't need to wait 5 seconds.FindElementsByClass
is an expensive operation. We should never call it multiple times if we can avoid it. I've implemented storing the results in a variable to cut the calls in half per loop.
Try this out and let me know if it's any faster:
Application.ScreenUpdating = False
Dim driver As WebDriver
Set driver = New EdgeDriver
driver.Start "edge"
driver.Get "http://www.sanparks.org/reservations/accommodation/calendar-month/park/26/camp/41/date/2022-07-01"
Dim mysheet As Worksheet
Set mysheet = Sheets("Sheet1")
Dim i As Integer
i = 7
eRow = mysheet.Cells(Rows.Count, 3).End(xlUp).Offset(7, 0).Row
Dim dayClass As Variant
dayClass = driver.FindElementsByClass("day")
Dim ele As WebElement
For Each ele In dayClass
Dim spanText As String
spanText = ele.FindElementByCss("span").Text
If spanText <> "" Then
mysheet.Cells(i, 2).Value = spanText
Dim boldText As String
boldText = ele.FindElementByClass("bold-name").Text
On Error Resume Next
If Mid(boldText, 1, 4) = "Camp" Then 'AccommodationType
mysheet.Cells(i, 3).Value = boldText & " " & ele.FindElementByClass("unit-numbers").Text
End If
On Error GoTo 0
i = i + 1
End If
Next ele
```
-
\$\begingroup\$ Thank you for having a look at my problem and the advice. I have tried your solution. It is working, but still takes a long time to complete. I had the following code when the website used to be available on IE and I didn't have to work through a web driver (this is only the web scraping part): \$\endgroup\$Nelj– Nelj2022年03月12日 05:26:43 +00:00Commented Mar 12, 2022 at 5:26
-
\$\begingroup\$ eRow = Sheet1.Cells(Rows.Count, 3).End(xlUp).Offset(7, 0).Row For Each ele In .document.all Select Case ele.className Case "day" RowCount = RowCount + 1 sht.Range("B" & RowCount) = ele.innerText Case "units" If Mid(ele.NextSibling.innerText, 1, 4) = StandType Then sht.Range("C" & RowCount) = ele.NextSibling.innerText & " - " & ele.innerText End If End Select Next ele \$\endgroup\$Nelj– Nelj2022年03月13日 07:06:31 +00:00Commented Mar 13, 2022 at 7:06
Explore related questions
See similar questions with these tags.