2
\$\begingroup\$

I want to scrape 25000 addresses using Column A cell values as search strings. I'm using the following code, it is working fine but it's too slow to extract 25000 results. Is there any way to speed up the process, like opening multiple addresses at once or something else? I'm pretty new to VBA coding. Any help would be appreciated.

Sub pullAddresses()
 Dim IE As Object
 Dim Doc As HTMLDocument
 
 Set IE = CreateObject("InternetExplorer.Application")
 IE.Visible = True
 
 
 
 For introw = 2 To 10
 searchstring = ThisWorkbook.Sheets("Sheet1").Range("A" & introw).Value
 IE.navigate ("https://www.google.com/search?q=" & searchstring)
 
 Do Until IE.readyState = READYSTATE_COMPLETE
 DoEvents
 Loop
 
 
 Set ht = IE.document
 
 ''Application.Wait (Now() + TimeValue("00:00:02"))
 
 a = ht.getElementsByClassName("desktop-title-content")(0).innerText
 b = ht.getElementsByClassName("desktop-title-subcontent")(0).innerText
 
 ThisWorkbook.Sheets("Sheet1").Range("B" & introw).Value = a & ", " & b
 
 Next
End Sub
mdfst13
22.4k6 gold badges34 silver badges70 bronze badges
asked Aug 22, 2021 at 7:50
\$\endgroup\$
4
  • 1
    \$\begingroup\$ Since you don't need to interact with the web page - click on things, scroll etc. you don't need to load a full web browser just to grab the html content. A much faster method would be to use "MSXML2.serverXMLHTTP" to query the page - e.g. as I have done in this answer. Note for multiple requests it's better not to create a whole new serverXMLHTTP object each time (as I'm doing there) and you should just reuse the same one like you are doing in your question for IE \$\endgroup\$ Commented Aug 22, 2021 at 8:13
  • 1
    \$\begingroup\$ @Greedo Why don't you make the comment into a full answer. \$\endgroup\$ Commented Aug 22, 2021 at 13:13
  • \$\begingroup\$ Remember that even if your code loops through 25,000 rows and executes in milliseconds, you're still at the mercy of the internet and Google's servers if you're executing a search there. I'd be willing to wager that if you put some timers in your code, the vast majority of your execution time will be in the Do Until...Loop line of code. Unfortunately, you cannot parallelize the operation, since VBA doesn't support threading, either. \$\endgroup\$ Commented Aug 22, 2021 at 14:39
  • \$\begingroup\$ Your best bet is to make asynchronous MSXML2.ServerXMLHTTP requests. Check out Retrieve data from eBird API and create multi-level hierarchy of locations. In my answer create a list of 50 servers. The code loop over the list checking the ready state of each server request. When a request was ready, the result would be processed and the server would be assigned a new request. This approach was 48 times faster then using a single synchronous request. \$\endgroup\$ Commented Aug 25, 2021 at 11:02

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.