Issues with open many browsers with uc_gui_handle_cf and ThreadPoolExecutor · seleniumbase/SeleniumBase · Discussion #3324

armorsmith-tech
Dec 8, 2024

Here's the code:
def parse_top_traders_in_threads(top_tokens):
driver = Driver(undetected=True)
parser = GmgnParser(driver)
try:
for top_token in top_tokens:
parser.parse_top_traders(top_token)

except Exception as ex:
logger.error(f"Ошибка при обработке токена {top_token}: {ex}")
finally:
driver.quit()
def main(threads_qty):
top_tokens = parting(parse_tokens_from_excel(), threads_qty)
with ThreadPoolExecutor(max_workers=threads_qty) as executor:
for _ in range(threads_qty):
executor.map(parse_top_traders_in_threads, top_tokens)

if name == 'main':
main(10)

using for json parse(bypass cf challenge)

def init(self, driver):
self.driver = driver
self.driver.maximize_window()

def get_json(self, url):
self.driver.uc_open_with_reconnect(url)
self.driver.uc_gui_click_cf()

# Получаем HTML-код страницы
html_content = self.driver.get_page_source()
soup = BeautifulSoup(html_content, 'html.parser')
# Ищем элемент <pre> для получения JSON
pre_element = soup.find('pre')
if pre_element:
 try:
 return json.loads(pre_element.text)
 except json.JSONDecodeError as e:
 logger.error(f"Ошибка декодирования JSON: {e}. Ответ: {pre_element.text}")
 return None
else:
 logger.error("Не удалось найти элемент <pre> на странице.")
 return None

after 1-2 hours working I always had errors like:
Target window is already closed
Can't open DEV Tools and more

I think it is because of cf bypass, because it is using GUI for
what is the best way to run the program?
I need to get data from about 30k pages

Can I use few Process instead of single?

Ran my program at windows server with 42 threads support

Answered by mdmintz

Dec 8, 2024

For multithreading, you have two options:

pytest via pytest-xdist. (Eg. pytest -n3, as seen in SeleniumBase/examples/presenter/multi_uc.py)
ThreadPoolExecutor as shown in SeleniumBase/help_docs/uc_mode.md.

Note that now there's CDP Mode, which is an upgrade over regular UC Mode. Add sb.uc_gui_click_captcha() as needed.

The number of threads should not exceed the number of CPUs on a system.

Also note that the CAPTCHA-clicking methods use PyAutoGUI: Only one thread is allowed to move the mouse at one time.

View full answer

Replies: 1 comment 1 reply

mdmintz
Dec 8, 2024
Maintainer

For multithreading, you have two options:

pytest via pytest-xdist. (Eg. pytest -n3, as seen in SeleniumBase/examples/presenter/multi_uc.py)
ThreadPoolExecutor as shown in SeleniumBase/help_docs/uc_mode.md.

Note that now there's CDP Mode, which is an upgrade over regular UC Mode. Add sb.uc_gui_click_captcha() as needed.

The number of threads should not exceed the number of CPUs on a system.

Also note that the CAPTCHA-clicking methods use PyAutoGUI: Only one thread is allowed to move the mouse at one time.

1 reply

@armorsmith-tech

armorsmith-tech Dec 9, 2024
Author

Is there other way to bypass cf on seleniumbase?
need to parse like 30k pages

Answer selected by mdmintz

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Issues with open many browsers with uc_gui_handle_cf and ThreadPoolExecutor #3324

Uh oh!

{{title}}

Uh oh!

armorsmith-tech
Dec 8, 2024

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

mdmintz
Dec 8, 2024
Maintainer

Uh oh!

{{title}}

Uh oh!

armorsmith-tech Dec 9, 2024
Author

Select a reply

Uh oh!

Uh oh!

Issues with open many browsers with uc_gui_handle_cf and ThreadPoolExecutor #3324

Uh oh!

armorsmith-tech Dec 8, 2024

Replies: 1 comment · 1 reply

Uh oh!

mdmintz Dec 8, 2024 Maintainer

Uh oh!

armorsmith-tech Dec 9, 2024 Author

armorsmith-tech
Dec 8, 2024

Replies: 1 comment 1 reply

mdmintz
Dec 8, 2024
Maintainer

armorsmith-tech Dec 9, 2024
Author