Avoid per-value stack expansion when decoding RowBinary Tuple #369

Open

@ruslandoga

Description

@ruslandoga

ruslandoga

opened

on May 19, 2026

Problem

Tuple decoding currently expands the tuple schema onto the decode stack for every tuple value:

{:tuple, tuple_types} ->
 decode_rows(tuple_types ++ [{:tuple_over, row} | types_rest], bin, [], rows, types)

That allocates proportional to tuple width per decoded row. This is the same general shape as the array/map stack expansion work, except tuple width is fixed and known from the type.

Benchmark

Environment:

commit: 5c9244a
macOS, Apple M2, 8 GB RAM
Elixir 1.19.5, Erlang/OTP 28.3, JIT enabled

Benchmark code:

alias Ch.RowBinary
base = DateTime.from_naive!(~N[2026年01月01日 00:00:00.000], "Etc/UTC")
flat_rows =
 Enum.map(1..100_000, fn i ->
 [i, "event", DateTime.add(base, i, :second)]
 end)
tuple_rows =
 Enum.map(flat_rows, fn row -> [List.to_tuple(row)] end)
flat_bin =
 flat_rows
 |> RowBinary.encode_rows(["UInt64", "String", "DateTime"])
 |> IO.iodata_to_binary()
tuple_bin =
 tuple_rows
 |> RowBinary.encode_rows(["Tuple(UInt64, String, DateTime)"])
 |> IO.iodata_to_binary()
Benchee.run(
 %{
 "decode flat UInt64/String/DateTime" => fn ->
 RowBinary.decode_rows(flat_bin, ["UInt64", "String", "DateTime"])
 end,
 "decode Tuple(UInt64,String,DateTime)" => fn ->
 RowBinary.decode_rows(tuple_bin, ["Tuple(UInt64, String, DateTime)"])
 end
 },
 warmup: 1,
 time: 2
)

Results:

Name ips average
decode flat UInt64/String/DateTime 6.99 143.10 ms
decode Tuple(UInt64,String,DateTime) 4.24 235.69 ms
Tuple decoding: 1.65x slower (+92.59 ms for 100k rows)

The comparison is not claiming tuples should be as cheap as flat rows, but it gives a baseline for the per-value tuple stack allocation.

Suggested direction

Avoid rebuilding tuple_types ++ marker for each tuple value. Possible approaches:

represent tuple decoding as a small state frame with current tuple types and accumulated tuple row;
precompute a reusable tuple decode frame during decoding_type/1;
align this with the array/map non-stack improvement so nested containers use one consistent mechanism.

Tests to add:

tuple decode round trips for scalar and nested types;
incomplete tuple payload still returns the same continuation/error behavior;
benchmark coverage for tuple-heavy rows.

Metadata

Assignees

No one assigned

Labels

No labels

Type

No type

Fields

Give feedback

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid per-value stack expansion when decoding RowBinary Tuple #369

Description

Problem

Benchmark

Suggested direction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions