My script is able to harvest full contents of a table from a webpage with javascript encrypted using vba in combination with selenium. The table has got a drop-down option from where the full contents can be selected by hitting "all". The table has got 300 rows of data which spread 7 column across. There are around seven steps to traverse to reach the destination page. It takes a while to parse them all. Just run it, sit back and relax until the browser is closed. It works perfectly. I could not manage to create the script with explicit wait cause I doubt there is any option in vba. Here is the working code.
Sub Table_Data()
Dim driver As New WebDriver
Dim posts As Object, post As Object, t_data As Object
With driver
.Start "chrome", "http://apps.tga.gov.au/Prod/devices"
.get "/daen-entry.aspx"
.FindElementById("disclaimer-accept").Click
.Wait 3000
.FindElementById("medicine-name").SendKeys ("pump")
.Wait 10000
.FindElementByClass("medicines-check-all").Click
.Wait 3000
.FindElementById("submit-button").Click
.Wait 5000
.FindElementById("ctl00_body_MedicineSummaryControl_cmbPageSelection").Click
.Wait 5000
.FindElementByXPath("//option[@value='all']").Click
.Wait 5000
End With
For Each posts In driver.FindElementsByXPath("//table[contains(@class,'daen-report')]")
For Each post In posts.FindElementsByXPath(".//tr")
For Each t_data In post.FindElementsByXPath(".//td[@class='row-odd']|.//td")
y = y + 1
Cells(x, y) = t_data.Text
Next t_data
x = x + 1
y = 0
Next post
Next posts
End Sub
1 Answer 1
Good job once again!
I would eliminate one loop depth level - there is only a single table that you are interested it - iterate over its rows directly. And, I would switch to more concise CSS selectors to locate table rows and cells.
Also, I would rename post
to a row
and t_data
to cell
to correspond to what these variables represent:
For Each row In driver.FindElementsByCss("table#ctl00_body_MedicineSummaryControl_grdSummary tr")
For Each cell In row.FindElementsByCss("td")
y = y + 1
Cells(x, y) = cell.Text
Next cell
x = x + 1
y = 0
Next row
-
\$\begingroup\$ Thanks sir alecxe, for your review and the comment. It's nice to see that you always come up something new to follow along. Perhaps I need to take a good care of css selectors. I just tested your code and found it working nice as always. Thanks again. \$\endgroup\$SIM– SIM2017年07月11日 17:06:08 +00:00Commented Jul 11, 2017 at 17:06
-
\$\begingroup\$ @SMth80 right, try to grasp CSS selectors - it would be quite beneficial for you in web-scraping. Thanks. \$\endgroup\$alecxe– alecxe2017年07月11日 17:07:17 +00:00Commented Jul 11, 2017 at 17:07
-
\$\begingroup\$ One request to you sir alecxe. Can you provide me with a link following which I can have a good idea on the basic of css selectors? Please, don't get bothered. \$\endgroup\$SIM– SIM2017年07月11日 19:35:49 +00:00Commented Jul 11, 2017 at 19:35
-
1\$\begingroup\$ @SMth80 well, I don't know of the best resource - there is plenty of them on the web. Mozilla's docs are pretty descriptive..and practice - open up browser developer tools, the console and play around with
$("css selector here")
function. Thanks. \$\endgroup\$alecxe– alecxe2017年07月11日 19:37:43 +00:00Commented Jul 11, 2017 at 19:37
Cells(x, y)
is confusing: the signature forCells
isCells([RowIndex], [ColumnIndex])
, which means you havex
for rows andy
for columns. \$\endgroup\$x
instinctively refers to the x-axis, andy
to the y-axis; with that in mind the call would beCells(y, x)
which looks uglier, but isn't wrong. Better namingrow
/column
would fix that though. \$\endgroup\$