rootVIII / proxy_web_crawler Public

Notifications You must be signed in to change notification settings
Fork 14
Star 46

Automates the process of repeatedly searching for a website via scraped proxy IP and search keywords

License

MIT license

46 stars 14 forks Branches Tags Activity

Star

Notifications

rootVIII/proxy_web_crawler

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
proxy_crawler.py		proxy_crawler.py
requirements.txt		requirements.txt
user-agents.txt		user-agents.txt

Repository files navigation

Search for a website with a different proxy each time

This script automates the process of searching for a website via keyword and the DuckDuckGo search engine.... page after page

Pass a complete URL and at least 1 keyword as command line arguments to run program:
python proxy_crawler.py -u <url> -k <keyword(s)>
python proxy_crawler.py -u "https://www.whatsmyip.org" -k "my ip"

Add the -x option to run headless (no GUI):
python proxy_crawler.py -u "https://www.whatsmyip.org" -k "my ip" -x

A list of proxies from the web are scraped first using sslproxies.org
Then using a new proxy socket for each iteration, the specified keyword(s) is searched for until the desired website is found
The website is then visited, and one random link is clicked within the website
The bot is slowed down on purpose, but will also run fairly slow due to proxy connection
Browser windows may open and close repeatedly during runtime (due to connection errors) until a healthy/valid proxy is encountered

Requirements:
- python3
- selenium
- Firefox browser
- geckodriver
Download the latest geckodriver from Mozilla
Unzip the file and place geckodriver into your path
Ensure selenium is installed: pip install -r requirements.txt

screenshot1

screenshot2

screenshot3

Author: rootVIII 2018-2023

About

Automates the process of repeatedly searching for a website via scraped proxy IP and search keywords

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

rootVIII/proxy_web_crawler

Folders and files

Latest commit

History

Repository files navigation

Search for a website with a different proxy each time

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

License

rootVIII/proxy_web_crawler

Folders and files

Latest commit

History

Repository files navigation

Search for a website with a different proxy each time

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages