How to design a Transfer-Encoding: chunked parser of Uint8Array's in JavaScript? [closed]

Question 1

I'm working on implementing parsing a series of Uint8Arrays that contain Transfer-Encoding: chunked data sent from the client using WHATWG Fetch over a single connection.

For example, writing 1 MB from the client can result in 14 separate Uint8Arrays in the server. The first in the series content could look like Uint8Array(41040)

[49, 48, 48, 48, 13, 10, 0, 0, 0, 0,..., 13, 10]

where in that first Uint8Array where filtering for 13 followed by 10 (\r\n) the indexes can look like

[4, 4102, 4108, 8206, 8212, 12310, 12316, 16414, 16420, 20518, 
20524, 24622, 24628, 28726, 28732, 32830, 32836, 36934, 36940, 41038]

How would you implement the algorithm to parse the length of the data, and extract the following data from the Uint8Arrays?

Can you edit your question to clarify things?

I'm using WICG Direct Sockets TCPServerSocket as a TCP server in Chromium browser. See direct-sockets-http-ws-server.

I've already implemented a runtime agnostic WebSocket server, where the same server code can be run using chrome, node, deno, bun, tjs, see JavaScript runtime agnostic WebSocket server.

Now, Chromium based browsers implement upload streaming or streaming requests using WHATWG Fetch implementation in Chromium, see Streaming requests with the fetch API, however, in general, with the exception being between a ServiceWorker and a WindowClient or Client which internall in chrome uses Mojo, not the network (type some lower case letters in the input field here Half duplex stream), only HTTP/2 is supported. HTTP/2 requires TLS. That means I would need to implement TLS in the browser for the server to avoid TLS handshake errors, when using the TCPServerSocket to process the fetch() request. That's on my TODO list, but that's a separate question.

There's another exception for uploading data, using a single connection, see Re: Issue 434292497: Allow streaming requests for HTTP/1.x

Status: Won't Fix (Intended Behavior) +yoichio@ who worked on the feature.

There seems the FetchUploadStreaming feature flag (https://crrev.com/c/2174099) for that. I don't think we want to enable it by default though. Resolving as working intended.

chrome --enable-features=FetchUploadStreaming

Now, that only provides the upload capability over HTTP/1.1

var abortable = new AbortController();
var { readable, writable } = new TransformStream({
 async transform(v, c) {
 for (let i = 0; i < v.length; i+= 1024) {
 c.enqueue(v.subarray(i, i + 1024));
 await scheduler.postTask(() => {}, {delay:10});
 }
 }, 
 flush() {
 console.log("flush");
 abortable.abort("");
 }
});
var writer = writable.getWriter();
var response = fetch("http://localhost:44818", {
 method: "post",
 duplex: "half",
 body: readable,
 signal: abortable.signal,
 allowHTTP1ForStreamingUpload: true
});
response.then((r) => {
 console.log(...r.headers);
 return r.body.pipeTo(
 new WritableStream({
 write(v) {
 console.log(v);
 },
 close() {
 console.log("close");
 }
 })
 )
})
.then(() => {
 console.log("Done streaming");
})
.catch(console.log);

writer.write(new Uint8Array(1024**2));

Keep on writing data until close() is called, and remember to abort the request to close the TCP connection.

which internally Chromium encodes to something like this, if you need a visual representation of what's going on here

enter image description here

See those Uint8Arrays that have almost the same length because I piped the data through a TransformStream()?

I think internally Chromium is using a buffer, probably around 65536, because that 1031 length does not remain consistent; we can't just say skip firt 6 bytes, remove (really get subarray()) trailing \r\n at end of Uint8Array, because right in the middle of Chromium sending those chunks to the server there will be something like this

enter image description here

See the Uint8Array having length 37 and 994?

If you want to see my scrath pad of attempts to solve this, here you go, unabridged

var abortable = new AbortController();
var { readable, writable } = new TransformStream({
 async transform(v, c) {
 for (let i = 0; i < v.length; i+= 1024) {
 c.enqueue(v.subarray(i, i + 1024));
 await scheduler.postTask(() => {}, {delay:10});
 }
 }, 
 flush() {
 console.log("flush");
 abortable.abort("");
 }
});
var writer = writable.getWriter();
var response = fetch("http://localhost:44818", {
 method: "post",
 duplex: "half",
 body: readable,
 signal: abortable.signal,
 allowHTTP1ForStreamingUpload: true
});
response.then((r) => {
 console.log(...r.headers);
 return r.body.pipeTo(
 new WritableStream({
 write(v) {
 console.log(v);
 },
 close() {
 console.log("close");
 }
 })
 )
})
.then(() => {
 console.log("Done streaming");
})
.catch(console.log);
writer.write(new Uint8Array(1024));
// Parse
var res = [];
var decoder = new TextDecoder();
let i = 0; 
let k = 0;
 for (; i < o.length; i++, k++) {
 if (o[i] === 13 && o[i+1] === 10) {
 let hex = decoder.decode(o.subarray(i-k, i));
 let len = parseInt(hex, 16);
 res.push({[len]:o.subarray(i+1, len+k+i+1)});
 console.log(hex, len); 
 i = i+len+k;
 console.log(i);
 k = 0;
 }
 }
var decoder = new TextDecoder();
var encoder = new TextEncoder()
function encode(text) {
 return encoder.encode(text);
}
//var body = "x";
var chunk = Uint8Array.from({length:255}, (_, i) => i);
var size = chunk.buffer.byteLength.toString(16);
var blob = new Blob([
 encode(`${size}\r\n`),
 chunk.buffer,
 encode("\r\n"),
 encode("0\r\n"),
 encode("\r\n"),
], {type:"application/octet-stream"});
new Response(blob, {headers:{"transfer-encoding":"chunked"}})
 .body
 .pipeThrough(new TextDecoderStream())
 .pipeTo(new WritableStream({
 write(v) {
 console.log(v);
 }
}));

So, I'm asking about general algorithms you would employ to implement Trasnfer-Encoding: chunked parsing, ideally without using strings at all, in JavaScript, or if you want, or any other programming language; and primarily, in prose, so I can read how you would go about this task?

Question 2

Comments have been moved to chat; please do not continue the discussion here. Before posting a comment below this one, please review the purposes of comments. Comments that do not request clarification or suggest improvements usually belong as an answer, on Software Engineering Meta, or in Software Engineering Chat. Comments continuing discussion may be removed.

Question 3

OK, so my understanding of the question is this:

You are processing an incoming stream of uint8 encoded characters.
After decoding from uint8, there is another encoding, the HTTP transfer encoding type "chunked". Which also needs decoding
However, the buffer on the incoming stream, and hence the data you get on read, doesn't match the "chunks" of the wrapped encoding.

I'm going to give you a prose answer, because its late and its going to be hard to write and test something with all the libs you are using and stuff. But I think the idea should be fairly simple. (Also because haters gona close the question before it can be answered!!)

Implement a second buffer. and keep track of whether you are at the end of a chunk or not.

When you receive the first write, you are guaranteed to be at the start of a chunk, start decoding as normal,

Convert each character to unicode (or request body encoding from the header?)
When you hit the 13 (/r), 10 (/l) you have the length. SAVE THIS

eg 49,48,48,48 utf-8 => 1000

Read this as hex digits 1000 = 4096
Read the next length number of ints and put this in your buffer
Put the buffer in your output.
Wipe the buffer data
Loop

All good so far, but OH NO! you hit the end of the array before you get to the end of a chunk!

Skip moving the buffer and wiping it.
Next write comes in
Is the buffer empty? No
continue reading and adding to the buffer until the buffer length matches the length you stored earlier
Put the buffer in your output.
Wipe the buffer data
Loop

Question 4

Comments have been moved to chat; please do not continue the discussion here. Before posting a comment below this one, please review the purposes of comments. Comments that do not request clarification or suggest improvements usually belong as an answer, on Software Engineering Meta, or in Software Engineering Chat. Comments continuing discussion may be removed.

Question 5

With HTTP protocol there is no guarantee the route between the client and server is persistent, the sent chunks sent over same HTTP connection could be delayed differently by the network, therefore each chunk sent from the client should include information to help the server match the chunks in the same sequence the entire message was deconstructed. Then if the chunked data have to pass the test of zero trust architecture encrypt the data and send a CRC with each chunk that before consider using HTTPS. All those are add-ons never considered since you are an ethical developer and never thought there are actors interested just to prove you wrong.

Starting simple: (1) chunk the data the way is described, (2) add to each chunk util information for the server to compute back the chunked message (e.g. chunk length, sequence identifier, message length).

Question 6

Your answer doesn't really describe an algorithm for parsing the data as-is, though thanks for your observations, anyway. I'm using HTTP/1.1 without encryption on purpose.

Ewan Ewan 83.9k5 gold badges90 silver badges186 bronze badges · Accepted Answer · 2025-08-10 21:20:42Z

OK, so my understanding of the question is this:

You are processing an incoming stream of uint8 encoded characters.
After decoding from uint8, there is another encoding, the HTTP transfer encoding type "chunked". Which also needs decoding
However, the buffer on the incoming stream, and hence the data you get on read, doesn't match the "chunks" of the wrapped encoding.

I'm going to give you a prose answer, because its late and its going to be hard to write and test something with all the libs you are using and stuff. But I think the idea should be fairly simple. (Also because haters gona close the question before it can be answered!!)

Implement a second buffer. and keep track of whether you are at the end of a chunk or not.

When you receive the first write, you are guaranteed to be at the start of a chunk, start decoding as normal,

Convert each character to unicode (or request body encoding from the header?)
When you hit the 13 (/r), 10 (/l) you have the length. SAVE THIS

eg 49,48,48,48 utf-8 => 1000

Read this as hex digits 1000 = 4096
Read the next length number of ints and put this in your buffer
Put the buffer in your output.
Wipe the buffer data
Loop

All good so far, but OH NO! you hit the end of the array before you get to the end of a chunk!

Skip moving the buffer and wiping it.
Next write comes in
Is the buffer empty? No
continue reading and adding to the buffer until the buffer length matches the length you stored earlier
Put the buffer in your output.
Wipe the buffer data
Loop

Comments have been moved to chat; please do not continue the discussion here. Before posting a comment below this one, please review the purposes of comments. Comments that do not request clarification or suggest improvements usually belong as an answer, on Software Engineering Meta, or in Software Engineering Chat. Comments continuing discussion may be removed.

Stack Exchange Network

How to design a Transfer-Encoding: chunked parser of Uint8Array's in JavaScript? [closed]

2 Answers 2

Hot Network Questions

How to design a Transfer-Encoding: chunked parser of Uint8Array's in JavaScript? [closed]

2 Answers 2

Related

Hot Network Questions