-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Version
1.54.0
Steps to reproduce
Dependencies
pip install flask playwright playwright install chromium
server.py
from flask import Flask, Response import time app = Flask(__name__) @app.route("/") def index(): return """ <!DOCTYPE html> <html> <head><meta charset="utf-8"><title>SSE Test</title></head> <body> <h1>SSE Test Page</h1> <button id="btn">Start SSE</button> <script> document.getElementById('btn').addEventListener('click', function() { const evtSource = new EventSource('/sse'); evtSource.onmessage = function(event) { console.log(event.data); }; evtSource.onerror = function() { evtSource.close(); }; }); </script> </body> </html> """ @app.route("/sse") def sse(): def generate(): messages = ["你好,这是第一条消息", "测试中文:😀🎉"] for msg in messages: yield f"data: {msg}\n\n".encode('utf-8') time.sleep(0.3) return Response(generate(), headers={ "Content-Type": "text/event-stream; charset=utf-8", "Cache-Control": "no-cache", }) if __name__ == "__main__": app.run(port=5000)
client.py
from playwright.sync_api import sync_playwright def main(): with sync_playwright() as p: browser = p.chromium.launch(headless=False) page = browser.new_page() # Method 1: route.fetch() - WORKS CORRECTLY def handle_route(route): response = route.fetch() body = response.body() print("\n[route.fetch()] - CORRECT") print(f" Raw bytes: {body!r}") print(f" Decoded: {body.decode('utf-8')!r}") route.fulfill(response=response) page.route("**/sse", handle_route) # Method 2: response event - BUG def on_response(response): if "/sse" in response.url: body = response.body() print("\n[response.body()] - BUG") print(f" Raw bytes: {body!r}") page.on("response", on_response) page.goto("http://localhost:5000") page.click("#btn") page.wait_for_timeout(3000) browser.close() if __name__ == "__main__": main()
Run
- Start the server:
python server.py - Run the client:
python client.py
Expected behavior
response.body() should return the raw UTF-8 bytes as sent by the server:
[route.fetch()] - CORRECT
Raw bytes: b'data: \xe4\xbd\xa0\xe5\xa5\xbd...'
Decoded: 'data: 你好,这是第一条消息\n\ndata: 测试中文:😀🎉\n\n'
[response.body()] - CORRECT
Raw bytes: b'data: \xe4\xbd\xa0\xe5\xa5\xbd...'
Actual behavior
response.body() returns double-encoded (mojibake) bytes:
[route.fetch()] - CORRECT
Raw bytes: b'data: \xe4\xbd\xa0\xe5\xa5\xbd...'
Decoded: 'data: 你好,这是第一条消息\n\ndata: 测试中文:😀🎉\n\n'
[response.body()] - BUG
Raw bytes: b'data: \xc3\xa4\xc2\xbd\xc2\xa0\xc3\xa5\xc2\xa5\xc2\xbd...'
This is the classic pattern of UTF-8 → Latin-1 decode → UTF-8 encode (mojibake).
The double-encoding can be verified:
correct = "你好".encode('utf-8') # b'\xe4\xbd\xa0\xe5\xa5\xbd' mojibake = correct.decode('latin-1').encode('utf-8') # b'\xc3\xa4\xc2\xbd\xc2\xa0\xc3\xa5\xc2\xa5\xc2\xbd'
The response.body() output matches the mojibake pattern exactly.
Additional context
- The browser DevTools Network tab shows the correct response
curlalso returns the correct bytes- Only
response.body()and CDPNetwork.getResponseBodyhave this issue - Tested with both Python and JavaScript bindings - same bug occurs
Root cause analysis: Likely a CDP (Chrome DevTools Protocol) issue
I tested calling CDP Network.getResponseBody directly:
| Method | Returns | Result |
|---|---|---|
route.fetch() |
bytes |
✅ Correct \xe4\xbd\xa0 |
CDP Network.getResponseBody |
str |
❌ Mojibake (already decoded incorrectly) |
response.body() |
bytes |
❌ Mojibake (derived from CDP) |
Key finding: CDP Network.getResponseBody returns a string (not bytes), and the string is already mojibake - meaning the incorrect decoding happens at the CDP layer, not in Playwright.
CDP test code (test_cdp.py):
from playwright.sync_api import sync_playwright def test_cdp(): with sync_playwright() as p: browser = p.chromium.launch(headless=False) page = browser.new_page() client = page.context.new_cdp_session(page) client.send("Network.enable") responses = {} def on_response_received(params): if "/sse" in params.get("response", {}).get("url", ""): responses[params["requestId"]] = params["response"]["url"] def on_loading_finished(params): if params["requestId"] in responses: result = client.send("Network.getResponseBody", {"requestId": params["requestId"]}) print(f"CDP Network.getResponseBody:") print(f" base64Encoded: {result.get('base64Encoded')}") print(f" body type: {type(result.get('body'))}") # <class 'str'> !!! print(f" body: {result.get('body')[:50]!r}...") # Already mojibake client.on("Network.responseReceived", on_response_received) client.on("Network.loadingFinished", on_loading_finished) page.goto("http://localhost:5000") page.click("#btn") page.wait_for_timeout(3000) browser.close() test_cdp()
CDP test output:
CDP Network.getResponseBody:
base64Encoded: False
body type: <class 'str'>
body: 'data: ä1⁄2\xa0å1円⁄2ï1⁄4Œè¿TMæ ̃ ̄第ä ̧€æ\x9d¡æ¶ˆæ\x81 ̄...' # Already mojibake!
JavaScript test (client.js):
const { chromium } = require('playwright'); (async () => { const browser = await chromium.launch({ headless: false }); const page = await browser.newPage(); await page.route('**/sse', async route => { const res = await route.fetch(); console.log('[route.fetch()]', (await res.body())); await route.fulfill({ response: res }); }); page.on('response', async res => { if (res.url().includes('/sse')) console.log('[response.body()]', (await res.body())); }); await page.goto('http://localhost:5000'); await page.click('#btn'); await page.waitForTimeout(3000); await browser.close(); })();
JavaScript output:
[route.fetch()] <Buffer 64 61 74 61 3a 20 e4 bd a0 e5 a5 bd ...> ✅ Correct
[response.body()] <Buffer 64 61 74 61 3a 20 c3 a4 c2 bd c2 a0 ...> ❌ Mojibake
Why this is a blocking issue (no viable workaround)
While route.fetch() returns correct bytes, it cannot be used as a workaround for real-world SSE streams:
- SSE streams can last for minutes (e.g., LLM streaming responses, real-time data feeds)
route.fetch()blocks until the entire response is completeroute.fulfill()can only be called afterroute.fetch()returns- This means the browser receives no data until the stream ends (minutes later)
My use case: I'm testing an AI chat application where SSE responses stream for 2-5 minutes. I need to capture the response content for automated testing, but:
response.body()gives mojibakeroute.fetch()blocks for minutes, making the test useless
The only remaining option is to use page.expose_function() and capture data via JavaScript in the browser, which is a hacky workaround that shouldn't be necessary.
Environment
- Operating System: Windows 10 Pro (10.0.19045)
- CPU: Intel Core i5-10500 @ 3.10GHz
- Browser: Chrome 143.0.7499.170
- Python Version: 3.10.16
- Node.js Version: 18.13.0
- Other info: Tested with both Python and JavaScript bindings