Unable to Fetch XHR Response Body with CDP · seleniumbase/SeleniumBase · Discussion #2731

AnirbanPatragithub
Apr 30, 2024

I'm attempting to integrate Wire and UC, but they're incompatible. So, I'm exploring using CDP to retrieve XHR in a standard Selenium Webdriver setup, with plans to apply the same in SeleniumBase later. I'm also generating a log.txt file with comprehensive information. My goal is to find 'Learn More' in the output, but it's not present in the logs_raw. Here's the code.@mdmintz

from selenium import webdriver
import json
import time
options = webdriver.ChromeOptions()
service = webdriver.ChromeService(service_args=["--verbose", "--log-path=log.txt"])
url = 'https://www.facebook.com/ads/library/?id=2567767530063004'
# url = 'https://weatherstack.com/' #<--this url works as expected
options.set_capability(
 "goog:loggingPrefs", {"performance": "ALL"}
 )
driver = webdriver.Chrome(options=options,service=service)
driver.implicitly_wait(15)
time.sleep(5)
driver.get(url)
time.sleep(30)
# extract requests from logs
logs_raw = driver.get_log("performance")
logs = [json.loads(lr["message"])["message"] for lr in logs_raw]
def log_filter(log_):
 return (
 # is an actual response
 log_["method"] == "Network.responseReceived"
 # and json
 and "json" in log_["params"]["response"]["mimeType"]
 )
for log in filter(log_filter, logs):
 request_id = log["params"]["requestId"]
 resp_url = log["params"]["response"]["url"]
 print(f"Caught {resp_url}")
 print(driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id}))

Answered by mdmintz

Apr 30, 2024

Examples of fetching responses via CDP:

from rich.pretty import pprint
from seleniumbase import Driver
driver = Driver(uc=True, log_cdp=True)
try:
 url = "seleniumbase.io/apps/turnstile"
 driver.uc_open_with_reconnect(url, 2)
 driver.switch_to_frame("iframe")
 driver.uc_click("span.mark")
 driver.sleep(3)
 pprint(driver.get_log("performance"))
finally:
 driver.quit()

from rich.pretty import pprint
from seleniumbase import BaseCase
BaseCase.main(__name__, __file__, "--uc", "--uc-cdp", "-s")
class CDPTests(BaseCase):
 def add_cdp_listener(self):
 # (To print everything, use "*"...

View full answer

Replies: 4 comments 11 replies

mdmintz
Apr 30, 2024
Maintainer

Examples of fetching responses via CDP:

from rich.pretty import pprint
from seleniumbase import Driver
driver = Driver(uc=True, log_cdp=True)
try:
 url = "seleniumbase.io/apps/turnstile"
 driver.uc_open_with_reconnect(url, 2)
 driver.switch_to_frame("iframe")
 driver.uc_click("span.mark")
 driver.sleep(3)
 pprint(driver.get_log("performance"))
finally:
 driver.quit()

from rich.pretty import pprint
from seleniumbase import BaseCase
BaseCase.main(__name__, __file__, "--uc", "--uc-cdp", "-s")
class CDPTests(BaseCase):
 def add_cdp_listener(self):
 # (To print everything, use "*". Otherwise select specific headers.)
 # self.driver.add_cdp_listener("*", lambda data: print(pformat(data)))
 self.driver.add_cdp_listener(
 "Network.requestWillBeSentExtraInfo",
 lambda data: pprint(data)
 )
 def click_turnstile_and_verify(sb):
 sb.switch_to_frame("iframe")
 sb.driver.uc_click("span.mark")
 sb.assert_element("img#captcha-success", timeout=3)
 sb.highlight("img#captcha-success", loops=8)
 def test_display_cdp_events(self):
 if not (self.undetectable and self.uc_cdp_events):
 self.get_new_driver(undetectable=True, uc_cdp_events=True)
 url = "seleniumbase.io/apps/turnstile"
 self.driver.uc_open_with_reconnect(url, 2)
 self.add_cdp_listener()
 self.click_turnstile_and_verify()
 self.sleep(1)
 self.refresh()
 self.sleep(0.5)

If you don't need UC Mode, you can use Wire Mode: #2145

0 replies

Answer selected by mdmintz

AnirbanPatragithub
Apr 30, 2024
Author

I have tried both the approaches already with seleniumbase
IMG_20240430_111119_370

I am not getting any output.
Thanks a lot for the help.

1 reply

@mdmintz

mdmintz Apr 30, 2024
Maintainer

Follow the examples. You may need to use driver.refresh() to get the logs from driver.get_log("performance"), especially if you just called a method that disconnects the driver, such as driver.uc_open_with_reconnect().

This generates lots of logs from the WeatherStack website:

from rich.pretty import pprint
from seleniumbase import Driver
driver = Driver(uc=True, log_cdp=True)
try:
 url = "weatherstack.com"
 driver.uc_open_with_reconnect(url, 2)
 driver.refresh()
 pprint(driver.get_log("performance"))
finally:
 driver.quit()

AnirbanPatragithub
Apr 30, 2024
Author

url = "https://www.facebook.com/ads/library/?id=2567767530063004"
from rich.pretty import pprint
from seleniumbase import Driver
import time
driver = Driver(uc=True, log_cdp=True)
try:
 # url = "weatherstack.com"
 driver.uc_open_with_reconnect(url, 2)
 driver.refresh()
 time.sleep(10)
 log = driver.get_log("performance")
 pprint(log)
 with open('Adlog.txt','w') as f:
 f.write(str(log))
finally:
 driver.quit()

I am trying to scrape the ad data from url but in the Adlog.txt the string 'Learn more' is missing.Most Probably the data is logged in bytes as it intercepted using normal selenium-wire with bytes and converted to string.The 'content-encoding' used is either 'br' or 'zstd'(Most likely).

image

from seleniumwire.utils import decode
body = decode(byte_data, 'zstd')

Cant decode the byte data to string.Any help is appreciated.

5 replies

@mdmintz

mdmintz Apr 30, 2024
Maintainer

It looks like you're trying to ask a selenium-wire question, but this is the SeleniumBase repo.
selenium-wire questions should be asked in their repo: https://github.com/wkeeling/selenium-wire

@AnirbanPatragithub

AnirbanPatragithub May 1, 2024
Author

driver = Driver(uc=True, log_cdp=True)
try:
 url="https://www.facebook.com/ads/library/?id=2567767530063004"
 driver.uc_open_with_reconnect(url, 2)
 driver.refresh()
 time.sleep(10)
 log = driver.get_log("performance")
except:
 print('Error')

Is there a way to decode bytes data received in XHR response in SeleniumBase?

@mdmintz

mdmintz May 1, 2024
Maintainer

Paste fully-coded scripts when showing examples, like this:

from rich.pretty import pprint
from seleniumbase import Driver
driver = Driver(uc=True, log_cdp=True)
try:
 url = "https://www.facebook.com/ads/library/?id=2567767530063004"
 driver.uc_open_with_reconnect(url, 2)
 driver.refresh()
 driver.sleep(3)
 pprint(driver.get_log("performance"))
finally:
 driver.quit()

"Is there a way to decode bytes data received in XHR response in SeleniumBase?"

Check StackOverflow. Something like that definitely falls outside of SeleniumBase's scope.

@AnirbanPatragithub

AnirbanPatragithub May 1, 2024
Author

will paste fully coded scripts in future.
Thanks for the help.
Any plans to include the functionality which decodes bytes data received in XHR response in SeleniumBase??

@mdmintz

mdmintz May 1, 2024
Maintainer

SeleniumBase/examples/cdp_mode/raw_xhr_sb.py

irux
Dec 22, 2024

@AnirbanPatragithub did you find a way to get the response body ? I am not getting it how to do it

5 replies

@AnirbanPatragithub

AnirbanPatragithub Dec 23, 2024
Author

Selenium base won't be able to do it. You can try seleniumwire Or nodriver.

@mdmintz

mdmintz Dec 23, 2024
Maintainer

There's a SeleniumBase example that does it: SeleniumBase/examples/cdp_mode/raw_xhr_sb.py

@AnirbanPatragithub

AnirbanPatragithub Dec 23, 2024
Author

There's a SeleniumBase example that does it: SeleniumBase/examples/cdp_mode/raw_xhr_sb.py

I checked it, and it's working for me. Unfortunately, I don’t fully understand the code. It doesn’t seem like undetected Selenium is enabled, though. Kudos to you! I wonder if the implementation could be simplified by using nodriver instead.

@Kamran-ov

Kamran-ov Dec 23, 2024

@mdmintz , are you planning on implementing this listen and receive XHR functions inside the seleniumbase in future or it will be always as separate function to listen when using uc ?

@mdmintz

mdmintz Dec 23, 2024
Maintainer

That uses the CDP async API, so modifications must be done in an async way.

Unable to Fetch XHR Response Body with CDP #2731

Uh oh!

AnirbanPatragithub Apr 30, 2024

Replies: 4 comments · 11 replies

Uh oh!

mdmintz Apr 30, 2024 Maintainer

Uh oh!

Uh oh!

AnirbanPatragithub Apr 30, 2024 Author

Uh oh!

mdmintz Apr 30, 2024 Maintainer

Uh oh!

AnirbanPatragithub Apr 30, 2024 Author

Uh oh!

mdmintz Apr 30, 2024 Maintainer

Uh oh!

AnirbanPatragithub May 1, 2024 Author

Uh oh!

mdmintz May 1, 2024 Maintainer

Uh oh!

AnirbanPatragithub May 1, 2024 Author

Uh oh!

Uh oh!

mdmintz May 1, 2024 Maintainer

Uh oh!

Uh oh!

irux Dec 22, 2024

Uh oh!

AnirbanPatragithub Dec 23, 2024 Author

Uh oh!

mdmintz Dec 23, 2024 Maintainer

Uh oh!

AnirbanPatragithub Dec 23, 2024 Author

Uh oh!

Kamran-ov Dec 23, 2024

Uh oh!

mdmintz Dec 23, 2024 Maintainer

AnirbanPatragithub
Apr 30, 2024

Replies: 4 comments 11 replies

mdmintz
Apr 30, 2024
Maintainer

AnirbanPatragithub
Apr 30, 2024
Author

mdmintz Apr 30, 2024
Maintainer

AnirbanPatragithub
Apr 30, 2024
Author

mdmintz Apr 30, 2024
Maintainer

AnirbanPatragithub May 1, 2024
Author

mdmintz May 1, 2024
Maintainer

AnirbanPatragithub May 1, 2024
Author

mdmintz May 1, 2024
Maintainer

irux
Dec 22, 2024

AnirbanPatragithub Dec 23, 2024
Author

mdmintz Dec 23, 2024
Maintainer

AnirbanPatragithub Dec 23, 2024
Author

mdmintz Dec 23, 2024
Maintainer