I m using python requests to search the following site: https://www.investing.com/ for the terms "Durable Goods Orders US"
I check in the "Network" tab of the inspect panel, and it seems it is simply done with the following form: 'quotes_search_text':'Durable Goods Orders US'
So I tried with python:
URL = 'https://www.investing.com/'
data = {'quotes_search_text':'Durable Goods Orders US'}
resp = requests.post(URL, data=data, headers={ 'User-Agent': 'Mozilla/5.0', 'X-Requested-With': 'XMLHttpRequest'})
However this doesnt return the result that i can see while doing it manually. All the search results should have "gs-title" as a class attribute (as per the page inspection) but when I do:
soup = BeautifulSoup(resp.text, 'html.parser')
soup.select(".gs-title")
I see no results... Is there some aspect of POST request that I am not taking into account? (im a complete noob here)
1 Answer 1
After going over this in detail in the chat, there are many changes. In order to retrieve the information your looking for, you need to run the JS that's being run on their end. You can change the query variable to whatever you want.
import requests
import json
from urllib.parse import quote_plus
URL = 'https://www.googleapis.com/customsearch/v1element'
query = 'Durable Goods Orders US'
query_formatted = quote_plus(query)
data = {
'key':'AIzaSyCVAXiUzRYsML1Pv6RwSG1gunmMikTzQqY',
'num':10,
'hl':'en',
'prettyPrint':'true',
'source':'gcsc',
'gss':'.com',
'cx':'015447872197439536574:fy9sb1kxnp8',
'q':query_formatted,
'googlehost':'www.google.com'
}
headers = {
'User-Agent':'Mozilla/5.0',
'Referer':'https://www.investing.com/search?q=' + query_formatted,
}
resp = requests.get(URL, params=data, headers=headers)
j = json.loads(resp.text)
# print(resp.text)
for r in j['results']:
print(r['title'], r['url'])
find_allselector is looking for a class attribute when it's expecting an HTML tag.<a class="gs-title" href="https://www.investing.com/economic-calendar/durable-goods-orders-86" target="_blank" dir="ltr" data-cturl="https://www.google.com/url?q=https://www.investing.com/economic-calendar/durable-goods-orders-86&sa=U&ved=0ahUKEwi28NG5tK7TAhWOa1AKHVhUBncQFggEMAA&client=internal-uds-cse&usg=AFQjCNEuRaJ1WI-VxrmeJ5VISPuraZ_Sug" data-ctorig="https://www.investing.com/economic-calendar/durable-goods-orders-86">United States <b>Durable Goods Orders</b> MoM</a>soup.find_all('a', {'class':'gs-title'})selectmethod.resp.textto manually search with ctrl-f in it, and I could see the correct page is not returned. So it's really with request that i need help.