I'm working on implementing parsing a series of Uint8Array
s that contain Transfer-Encoding: chunked
data sent from the client using WHATWG Fetch over a single connection.
For example, writing 1 MB from the client can result in 14 separate Uint8Array
s in the server. The first in the series content could look like Uint8Array(41040)
[49, 48, 48, 48, 13, 10, 0, 0, 0, 0,..., 13, 10]
where in that first Uint8Array
where filtering for 13
followed by 10
(\r\n
) the indexes can look like
[4, 4102, 4108, 8206, 8212, 12310, 12316, 16414, 16420, 20518,
20524, 24622, 24628, 28726, 28732, 32830, 32836, 36934, 36940, 41038]
How would you implement the algorithm to parse the length of the data, and extract the following data from the Uint8Array
s?
Can you edit your question to clarify things?
I'm using WICG Direct Sockets TCPServerSocket
as a TCP server in Chromium browser. See direct-sockets-http-ws-server.
I've already implemented a runtime agnostic WebSocket
server, where the same server code can be run using chrome
, node
, deno
, bun
, tjs
, see JavaScript runtime agnostic WebSocket server.
Now, Chromium based browsers implement upload streaming or streaming requests using WHATWG Fetch implementation in Chromium, see Streaming requests with the fetch API, however, in general, with the exception being between a ServiceWorker
and a WindowClient
or Client
which internall in chrome
uses Mojo, not the network (type some lower case letters in the input field here Half duplex stream), only HTTP/2 is supported. HTTP/2 requires TLS. That means I would need to implement TLS in the browser for the server to avoid TLS handshake errors, when using the TCPServerSocket
to process the fetch()
request. That's on my TODO list, but that's a separate question.
There's another exception for uploading data, using a single connection, see Re: Issue 434292497: Allow streaming requests for HTTP/1.x
Status: Won't Fix (Intended Behavior) +yoichio@ who worked on the feature.
There seems the FetchUploadStreaming feature flag (https://crrev.com/c/2174099) for that. I don't think we want to enable it by default though. Resolving as working intended.
chrome --enable-features=FetchUploadStreaming
Now, that only provides the upload capability over HTTP/1.1
var abortable = new AbortController();
var { readable, writable } = new TransformStream({
async transform(v, c) {
for (let i = 0; i < v.length; i+= 1024) {
c.enqueue(v.subarray(i, i + 1024));
await scheduler.postTask(() => {}, {delay:10});
}
},
flush() {
console.log("flush");
abortable.abort("");
}
});
var writer = writable.getWriter();
var response = fetch("http://localhost:44818", {
method: "post",
duplex: "half",
body: readable,
signal: abortable.signal,
allowHTTP1ForStreamingUpload: true
});
response.then((r) => {
console.log(...r.headers);
return r.body.pipeTo(
new WritableStream({
write(v) {
console.log(v);
},
close() {
console.log("close");
}
})
)
})
.then(() => {
console.log("Done streaming");
})
.catch(console.log);
writer.write(new Uint8Array(1024**2));
Keep on writing data until close()
is called, and remember to abort the request to close the TCP connection.
which internally Chromium encodes to something like this, if you need a visual representation of what's going on here
See those Uint8Array
s that have almost the same length
because I piped the data through a TransformStream()
?
I think internally Chromium is using a buffer, probably around 65536
, because that 1031
length does not remain consistent; we can't just say skip firt 6 bytes, remove (really get subarray()
) trailing \r\n
at end of Uint8Array
, because right in the middle of Chromium sending those chunks to the server there will be something like this
See the Uint8Array
having length
37
and 994
?
If you want to see my scrath pad of attempts to solve this, here you go, unabridged
var abortable = new AbortController();
var { readable, writable } = new TransformStream({
async transform(v, c) {
for (let i = 0; i < v.length; i+= 1024) {
c.enqueue(v.subarray(i, i + 1024));
await scheduler.postTask(() => {}, {delay:10});
}
},
flush() {
console.log("flush");
abortable.abort("");
}
});
var writer = writable.getWriter();
var response = fetch("http://localhost:44818", {
method: "post",
duplex: "half",
body: readable,
signal: abortable.signal,
allowHTTP1ForStreamingUpload: true
});
response.then((r) => {
console.log(...r.headers);
return r.body.pipeTo(
new WritableStream({
write(v) {
console.log(v);
},
close() {
console.log("close");
}
})
)
})
.then(() => {
console.log("Done streaming");
})
.catch(console.log);
writer.write(new Uint8Array(1024));
// Parse
var res = [];
var decoder = new TextDecoder();
let i = 0;
let k = 0;
for (; i < o.length; i++, k++) {
if (o[i] === 13 && o[i+1] === 10) {
let hex = decoder.decode(o.subarray(i-k, i));
let len = parseInt(hex, 16);
res.push({[len]:o.subarray(i+1, len+k+i+1)});
console.log(hex, len);
i = i+len+k;
console.log(i);
k = 0;
}
}
var decoder = new TextDecoder();
var encoder = new TextEncoder()
function encode(text) {
return encoder.encode(text);
}
//var body = "x";
var chunk = Uint8Array.from({length:255}, (_, i) => i);
var size = chunk.buffer.byteLength.toString(16);
var blob = new Blob([
encode(`${size}\r\n`),
chunk.buffer,
encode("\r\n"),
encode("0\r\n"),
encode("\r\n"),
], {type:"application/octet-stream"});
new Response(blob, {headers:{"transfer-encoding":"chunked"}})
.body
.pipeThrough(new TextDecoderStream())
.pipeTo(new WritableStream({
write(v) {
console.log(v);
}
}));
So, I'm asking about general algorithms you would employ to implement Trasnfer-Encoding: chunked
parsing, ideally without using strings at all, in JavaScript, or if you want, or any other programming language; and primarily, in prose, so I can read how you would go about this task?
-
Comments have been moved to chat; please do not continue the discussion here. Before posting a comment below this one, please review the purposes of comments. Comments that do not request clarification or suggest improvements usually belong as an answer, on Software Engineering Meta, or in Software Engineering Chat. Comments continuing discussion may be removed.CPlus– CPlus ♦08/14/2025 02:18:48Commented Aug 14 at 2:18
2 Answers 2
OK, so my understanding of the question is this:
- You are processing an incoming stream of uint8 encoded characters.
- After decoding from uint8, there is another encoding, the HTTP transfer encoding type "chunked". Which also needs decoding
- However, the buffer on the incoming stream, and hence the data you get on read, doesn't match the "chunks" of the wrapped encoding.
I'm going to give you a prose answer, because its late and its going to be hard to write and test something with all the libs you are using and stuff. But I think the idea should be fairly simple. (Also because haters gona close the question before it can be answered!!)
Implement a second buffer. and keep track of whether you are at the end of a chunk or not.
When you receive the first write, you are guaranteed to be at the start of a chunk, start decoding as normal,
Convert each character to unicode (or request body encoding from the header?)
When you hit the 13 (/r), 10 (/l) you have the length. SAVE THIS
eg 49,48,48,48 utf-8 => 1000
Read this as hex digits 1000 = 4096
Read the next length number of ints and put this in your buffer
Put the buffer in your output.
Wipe the buffer data
Loop
All good so far, but OH NO! you hit the end of the array before you get to the end of a chunk!
- Skip moving the buffer and wiping it.
- Next write comes in
- Is the buffer empty? No
- continue reading and adding to the buffer until the buffer length matches the length you stored earlier
- Put the buffer in your output.
- Wipe the buffer data
- Loop
-
Comments have been moved to chat; please do not continue the discussion here. Before posting a comment below this one, please review the purposes of comments. Comments that do not request clarification or suggest improvements usually belong as an answer, on Software Engineering Meta, or in Software Engineering Chat. Comments continuing discussion may be removed.08/14/2025 00:28:53Commented Aug 14 at 0:28
With HTTP protocol there is no guarantee the route between the client and server is persistent, the sent chunks sent over same HTTP connection could be delayed differently by the network, therefore each chunk sent from the client should include information to help the server match the chunks in the same sequence the entire message was deconstructed. Then if the chunked data have to pass the test of zero trust architecture encrypt the data and send a CRC with each chunk that before consider using HTTPS. All those are add-ons never considered since you are an ethical developer and never thought there are actors interested just to prove you wrong.
Starting simple: (1) chunk the data the way is described, (2) add to each chunk util information for the server to compute back the chunked message (e.g. chunk length, sequence identifier, message length).
-
1Your answer doesn't really describe an algorithm for parsing the data as-is, though thanks for your observations, anyway. I'm using HTTP/1.1 without encryption on purpose.guest271314– guest27131408/11/2025 06:01:05Commented Aug 11 at 6:01