Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

rootVIII/proxy_web_crawler

Repository files navigation

Search for a website with a different proxy each time

This script automates the process of searching for a website via keyword and the DuckDuckGo search engine.... page after page

Pass a complete URL and at least 1 keyword as command line arguments to run program:
python proxy_crawler.py -u <url> -k <keyword(s)>
python proxy_crawler.py -u "https://www.whatsmyip.org" -k "my ip"

Add the -x option to run headless (no GUI):
python proxy_crawler.py -u "https://www.whatsmyip.org" -k "my ip" -x

  • A list of proxies from the web are scraped first using sslproxies.org
  • Then using a new proxy socket for each iteration, the specified keyword(s) is searched for until the desired website is found
  • The website is then visited, and one random link is clicked within the website
  • The bot is slowed down on purpose, but will also run fairly slow due to proxy connection
  • Browser windows may open and close repeatedly during runtime (due to connection errors) until a healthy/valid proxy is encountered

  • Requirements:
    • python3
    • selenium
    • Firefox browser
    • geckodriver
  • Download the latest geckodriver from Mozilla
  • Unzip the file and place geckodriver into your path
  • Ensure selenium is installed: pip install -r requirements.txt

screenshot1



screenshot2



screenshot3


Author: rootVIII 2018-2023

Releases

No releases published

Packages

No packages published

Languages

AltStyle によって変換されたページ (->オリジナル) /