-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Hi, I have created a package named botasaurus-proxy-authentication
, which enables SSL support for proxies requiring authentication.
For instance, when using an authenticated proxy with a tool like seleniumwire to scrape a Cloudflare-protected website such as G2.com, a non-SSL connection typically results in being blocked.
To illustrate, run this code:
First, install the required packages:
python -m pip install selenium_wire chromedriver_autoinstaller
Then, execute this Python script:
from seleniumwire import webdriver from chromedriver_autoinstaller import install # Define the proxy proxy_options = { 'proxy': { 'http': 'http://username:password@proxy-provider-domain:port', # Replace with your proxy 'https': 'http://username:password@proxy-provider-domain:port', # Replace with your proxy } } # Install and set up the driver driver_path = install() driver = webdriver.Chrome(driver_path, seleniumwire_options=proxy_options) # Navigate to the desired URL link = 'https://www.g2.com/products/github/reviews' driver.get("https://www.google.com/") driver.execute_script(f'window.location.href = "{link}"') # Wait for user input input("Press Enter to exit...") # Clean up driver.quit()
You'll likely be blocked by Cloudflare:
First, install the required packages:
python -m pip install botasaurus-proxy-authentication
However, using botasaurus_proxy_authentication
with proxies circumvents this problem. Notice the difference by running the following code:
from selenium import webdriver from selenium.webdriver.chrome.options import Options from chromedriver_autoinstaller import install from botasaurus_proxy_authentication import add_proxy_options # Define the proxy settings proxy = 'http://username:password@proxy-provider-domain:port' # Replace with your proxy # Set Chrome options chrome_options = Options() add_proxy_options(chrome_options, proxy) # Install and set up the driver driver_path = install() driver = webdriver.Chrome(driver_path, options=chrome_options) # Navigate to the desired URL link = 'https://www.g2.com/products/github/reviews' driver.get("https://www.google.com/") driver.execute_script(f'window.location.href = "{link}"') # Wait for user input input("Press Enter to exit...") # Clean up driver.quit()
Result:
not blocked
I suggest using botasaurus_proxy_authentication
for its SSL support for authenticated proxies, improving the success rate of scraping Cloudflare-protected websites and thus increasing revenue for Oxylabs.
Also, Thanks Oxylabs for your Great Work in Proxy.
Good Luck to the Team.