Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Issues with open many browsers with uc_gui_handle_cf and ThreadPoolExecutor #3324

Answered by mdmintz
armorsmith-tech asked this question in Q&A
Discussion options

Here's the code:
def parse_top_traders_in_threads(top_tokens):
driver = Driver(undetected=True)
parser = GmgnParser(driver)
try:
for top_token in top_tokens:
parser.parse_top_traders(top_token)

except Exception as ex:
logger.error(f"Ошибка при обработке токена {top_token}: {ex}")
finally:
driver.quit()
def main(threads_qty):
top_tokens = parting(parse_tokens_from_excel(), threads_qty)
with ThreadPoolExecutor(max_workers=threads_qty) as executor:
for _ in range(threads_qty):
executor.map(parse_top_traders_in_threads, top_tokens)

if name == 'main':
main(10)

using for json parse(bypass cf challenge)

def init(self, driver):
self.driver = driver
self.driver.maximize_window()

def get_json(self, url):
self.driver.uc_open_with_reconnect(url)
self.driver.uc_gui_click_cf()

# Получаем HTML-код страницы
html_content = self.driver.get_page_source()
soup = BeautifulSoup(html_content, 'html.parser')
# Ищем элемент <pre> для получения JSON
pre_element = soup.find('pre')
if pre_element:
 try:
 return json.loads(pre_element.text)
 except json.JSONDecodeError as e:
 logger.error(f"Ошибка декодирования JSON: {e}. Ответ: {pre_element.text}")
 return None
else:
 logger.error("Не удалось найти элемент <pre> на странице.")
 return None

after 1-2 hours working I always had errors like:
Target window is already closed
Can't open DEV Tools and more

I think it is because of cf bypass, because it is using GUI for
what is the best way to run the program?
I need to get data from about 30k pages

Can I use few Process instead of single?

Ran my program at windows server with 42 threads support

You must be logged in to vote

For multithreading, you have two options:

  1. pytest via pytest-xdist. (Eg. pytest -n3, as seen in SeleniumBase/examples/presenter/multi_uc.py)
  2. ThreadPoolExecutor as shown in SeleniumBase/help_docs/uc_mode.md.

Note that now there's CDP Mode, which is an upgrade over regular UC Mode. Add sb.uc_gui_click_captcha() as needed.

The number of threads should not exceed the number of CPUs on a system.

Also note that the CAPTCHA-clicking methods use PyAutoGUI: Only one thread is allowed to move the mouse at one time.

Replies: 1 comment 1 reply

Comment options

For multithreading, you have two options:

  1. pytest via pytest-xdist. (Eg. pytest -n3, as seen in SeleniumBase/examples/presenter/multi_uc.py)
  2. ThreadPoolExecutor as shown in SeleniumBase/help_docs/uc_mode.md.

Note that now there's CDP Mode, which is an upgrade over regular UC Mode. Add sb.uc_gui_click_captcha() as needed.

The number of threads should not exceed the number of CPUs on a system.

Also note that the CAPTCHA-clicking methods use PyAutoGUI: Only one thread is allowed to move the mouse at one time.

You must be logged in to vote
1 reply
Comment options

Is there other way to bypass cf on seleniumbase?
need to parse like 30k pages

Answer selected by mdmintz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /