-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Issues with open many browsers with uc_gui_handle_cf and ThreadPoolExecutor #3324
-
Here's the code:
def parse_top_traders_in_threads(top_tokens):
driver = Driver(undetected=True)
parser = GmgnParser(driver)
try:
for top_token in top_tokens:
parser.parse_top_traders(top_token)
except Exception as ex:
logger.error(f"Ошибка при обработке токена {top_token}: {ex}")
finally:
driver.quit()
def main(threads_qty):
top_tokens = parting(parse_tokens_from_excel(), threads_qty)
with ThreadPoolExecutor(max_workers=threads_qty) as executor:
for _ in range(threads_qty):
executor.map(parse_top_traders_in_threads, top_tokens)
if name == 'main':
main(10)
using for json parse(bypass cf challenge)
def init(self, driver):
self.driver = driver
self.driver.maximize_window()
def get_json(self, url):
self.driver.uc_open_with_reconnect(url)
self.driver.uc_gui_click_cf()
# Получаем HTML-код страницы
html_content = self.driver.get_page_source()
soup = BeautifulSoup(html_content, 'html.parser')
# Ищем элемент <pre> для получения JSON
pre_element = soup.find('pre')
if pre_element:
try:
return json.loads(pre_element.text)
except json.JSONDecodeError as e:
logger.error(f"Ошибка декодирования JSON: {e}. Ответ: {pre_element.text}")
return None
else:
logger.error("Не удалось найти элемент <pre> на странице.")
return None
after 1-2 hours working I always had errors like:
Target window is already closed
Can't open DEV Tools and more
I think it is because of cf bypass, because it is using GUI for
what is the best way to run the program?
I need to get data from about 30k pages
Can I use few Process instead of single?
Ran my program at windows server with 42 threads support
Beta Was this translation helpful? Give feedback.
All reactions
For multithreading, you have two options:
-
pytest
viapytest-xdist
. (Eg.pytest -n3
, as seen in SeleniumBase/examples/presenter/multi_uc.py) -
ThreadPoolExecutor
as shown in SeleniumBase/help_docs/uc_mode.md.
Note that now there's CDP Mode, which is an upgrade over regular UC Mode. Add sb.uc_gui_click_captcha()
as needed.
The number of threads should not exceed the number of CPUs on a system.
Also note that the CAPTCHA-clicking methods use PyAutoGUI
: Only one thread is allowed to move the mouse at one time.
Replies: 1 comment 1 reply
-
For multithreading, you have two options:
pytest
viapytest-xdist
. (Eg.pytest -n3
, as seen in SeleniumBase/examples/presenter/multi_uc.py)ThreadPoolExecutor
as shown in SeleniumBase/help_docs/uc_mode.md.
Note that now there's CDP Mode, which is an upgrade over regular UC Mode. Add sb.uc_gui_click_captcha()
as needed.
The number of threads should not exceed the number of CPUs on a system.
Also note that the CAPTCHA-clicking methods use PyAutoGUI
: Only one thread is allowed to move the mouse at one time.
Beta Was this translation helpful? Give feedback.
All reactions
-
Is there other way to bypass cf on seleniumbase?
need to parse like 30k pages
Beta Was this translation helpful? Give feedback.