1

I am trying to write a scraper in python using requests with proxies to scrape a https page. I found lists of free proxies on the internet and manually validated a bunch of them in an online proxy-checker. I also made sure to use only proxies that support https according to the website. But in python nearly all of them fail for http pages and ALL of them do not work for my desired https page. I did everythin according to the tutorials I found and I am running out of ideas what could possibly be the issue. I plan to look into the actual error messages without the try/except today, but I hoped someone could tell me if the code is valid in the first place.

 def proxy_json_test_saved_proxies(self):
 test_count = 1
 timeout_seconds = 10
 working_http = 0
 working_https = 0
 for proxy_dict in self.all_proxies:
 print("#######")
 print("Testing http proxy " + str(test_count) + "/" + str(len(self.all_proxies)))
 test_count += 1
 proxy = {'http':'http://' + proxy_dict["address"],
 'https':'https://' + proxy_dict["address"]
 }
 print(proxy)
 print("Try http connection:")
 try:
 requests.get("http://example.com", proxies = proxy, timeout = timeout_seconds)
 except IOError:
 print("Fail")
 else:
 print("Success")
 working_http += 1
 print("Try https connection:")
 try:
 requests.get("https://example.com", proxies = proxy, timeout = timeout_seconds)
 except IOError:
 print("Fail")
 else:
 print("Success")
 working_https += 1
 print("Working http: ", working_http)
 print("Working https: ", working_https)

proxy_dict["address"] contains ip:port values like "185.247.177.27:80". self.all_proxies is a list of about 100 of those proxy_dicts.

I also know, that these free proxies might often times be already occupied. Thus I repeated the routine multiple times without ANY of them working for https and no real improvement in the http-count either.

asked Dec 2, 2022 at 8:06

1 Answer 1

1

me again. Solved the issue and wanted to post the answer. In the end it was just a typo in the proxy definition. The proxy server is reached via http, no matter if the goal url uses http or https.

I changed this:

proxy = {'http':'http://' + proxy_dict["address"],
 'https':'https://' + proxy_dict["address"]
 }

To this (deleted the "s" in https string):

proxy = {'http':'http://' + proxy_dict["address"],
 'https':'http://' + proxy_dict["address"]
 }

And now it works.

answered Dec 23, 2022 at 14:12
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.