Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Automating screenshot capture of phishing websites #3694

DanielCEvans started this conversation in General
Discussion options

Hi Michael,
I work at a CERT in Australia and am currently trying to automate one of the services we offer our members using SeleniumBase. Our members will submit phishing takedown requests which we analyse, and if deemed to be a phishing site, we will submit a takedown request to the relevant registrar.
As part of the analysis and takedown submission, we gather screenshots of the phishing website as evidence. This is where we were hoping to use SeleniumBase to automate this process :)

We receive numerous phishing takedown submissions daily where the site may or may not be behind a captcha (the captcha's themselves can vary).

My question is: do you think this screenshot capture process is able to be automated using SeleniumBase given that we are not trying to access the same site each time? (e.g. the site may or may not have a captcha, the captcha's themselves can vary). If we were to try and host something on AWS would we need some sort of rotating proxy to avoid bot detection?

Any suggestions would be greatly appreciated :)

Thanks,
Dan

You must be logged in to vote

Replies: 1 comment

Comment options

Hi Dan,
The screenshot part is rather easy, eg:

from seleniumbase import SB
with SB(uc=True, test=True) as sb:
 url = "https://www.selenium.dev/ecosystem/"
 sb.activate_cdp_mode(url)
 sb.sleep(1)
 sb.save_screenshot("image.png", selector="body")

As for the anti-bot bypass and CAPTCHA-solving, that depends on the CAPTCHA / anti-bot system, as those are not uniformly made. The examples in SeleniumBase/examples/cdp_mode demonstrate various ways of bypassing the anti-bot tech. One of the easiest CAPTCHAs to bypass are Cloudflare Turnstiles (found in several examples). You'll likely need to use a residential proxy, since you would get blocked by sites for coming from a non-residential IP address range such as AWS. There's a proxy arg that you can set to change your proxy settings. If you have your own data center (or local machines to use), then you might not need to use residential proxies. GitHub Actions works well too. The CDP Mode ReadMe can help you get started with creating stealthy scripts.

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /