Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix: throw on schema mismatch in RecordBatchStreamWriter.write()#389

Open
rustyconover wants to merge 3 commits into
apache:main from
Query-farm:fix/schema-mismatch-throw
Open

fix: throw on schema mismatch in RecordBatchStreamWriter.write() #389
rustyconover wants to merge 3 commits into
apache:main from
Query-farm:fix/schema-mismatch-throw

Conversation

@rustyconover

@rustyconover rustyconover commented Feb 27, 2026

Copy link
Copy Markdown
Contributor

Summary

  • RecordBatchStreamWriter.write() previously silently closed the writer and dropped the batch when a schema mismatch was detected with autoDestroy: true (the default), causing data loss with no error
  • Now throws an Error with both expected and received schemas, making mismatches immediately visible
  • Added JSDoc for the autoDestroy option explaining its behavior

Test plan

  • Added schema-mismatch-tests.ts with 4 tests covering:
    • autoDestroy=true: throws on schema mismatch
    • autoDestroy=false: resets schema and writes multi-stream output (existing behavior preserved)
    • Writing to a closed writer throws
    • Error message includes both schemas
  • All tests pass (npx jest --testPathPattern='schema-mismatch')

Closes #388

🤖 Generated with Claude Code

Copy link
Copy Markdown
Contributor Author

@raulcd could you please trigger the CI on this too? Thanks.

raulcd commented Mar 4, 2026

Copy link
Copy Markdown
Member

sure thing!

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a data-loss bug in RecordBatchStreamWriter.write() where schema mismatches with autoDestroy: true (the default) previously caused silent data loss — the writer would close and discard the batch without any error. The fix throws a descriptive Error instead, and also adds a JSDoc comment for the autoDestroy option.

Changes:

  • Modified write() in RecordBatchWriter to throw a descriptive Error on schema mismatch when autoDestroy: true, instead of silently dropping the batch
  • Added a JSDoc description for the autoDestroy option in RecordBatchStreamWriterOptions
  • Added a new test file with 4 tests covering the schema mismatch behaviors

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/ipc/writer.ts Core fix: throws Error on schema mismatch when autoDestroy=true; added autoDestroy JSDoc
test/unit/ipc/writer/schema-mismatch-tests.ts New test file covering schema mismatch throw, multi-stream output, closed writer behavior, and error message content

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread src/ipc/writer.ts
Comment thread test/unit/ipc/writer/schema-mismatch-tests.ts Outdated
Comment thread test/unit/ipc/writer/schema-mismatch-tests.ts Outdated

@trxcllnt trxcllnt left a comment
edited
Loading

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right solution. The call to this.close() closes the underlying sink. Without it, the underlying sink is left open, and data loss may occur for the already-written batches.

This seems like something that needs to be handled by the user checking whether the writer is closed before writing the next batch:

for (const batch in batches) {
 writer.write(batch);
 if writer.closed {
 break;
 }
}

Generally users shouldn't be in a scenario where they're writing batches with heterogeneous schemas without knowing it, but if they are, they should validate that before writing (like above).

Previously, writing a RecordBatch with a mismatched schema to a writer
with autoDestroy=true (the default) would silently close the writer and
drop the batch, causing data loss with no error or warning.
Now throws an Error with both the expected and received schemas, making
the mismatch immediately visible to the caller.
Closes apache#388
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copy link
Copy Markdown
Contributor Author

Hi @trxcllnt and @kou,

I think my original scope was too big so I'm going to scope this down to just the core issue that took a few hours of debugging.

So based on your feedback I:

  • Removed redundant tests — dropped the autoDestroy=false multi-stream test (already covered by stream-writer-tests.ts, streams-dom-tests.ts, and streams-node-tests.ts) and the closed-writer test (pre-existing behavior, not introduced by this PR)
  • Added JSDoc for the new throwing behavior on autoDestroy
  • Switched to static imports for consistency with other test files

The PR is now scoped to just the core fix: when autoDestroy=true (the default), writing a batch with a mismatched schema throws an error instead of silently closing the writer and dropping the batch.

Why throw instead of silently closing: The previous behavior called return this.close() when it detected a schema mismatch with autoDestroy=true. From the caller's perspective, writer.write(batch) returned successfully — there was no indication that the batch was dropped and the writer was closed. This made schema mismatches extremely difficult to debug, since data silently disappeared with no error, no warning, and no way to distinguish a successful write from a dropped one. Throwing makes the problem immediately visible at the point where it occurs.

It may be that the user should check if they are trying to write mismatched schemas, but if you use PyArrow or the other Arrow implementations they raise an exception when you try to write a record batch that doesn't match the schema. I'm trying to promote consistent behavior.

Copy link
Copy Markdown
Contributor Author

@raulcd can you trigger CI on this one again too? Thank you!

Comment thread src/ipc/writer.ts Outdated
Comment thread src/ipc/writer.ts
rustyconover and others added 2 commits March 5, 2026 12:30
Co-authored-by: Paul Taylor <178183+trxcllnt@users.noreply.github.com>
Co-authored-by: Paul Taylor <178183+trxcllnt@users.noreply.github.com>

trxcllnt commented Mar 5, 2026

Copy link
Copy Markdown
Contributor

This PR should also update the node Transform stream's _write() implementation to try/catch the error and pass it to the callback:

 _write(x: any, _: string, cb: CB) {
 const writer = this._writer;
 let err = [];
 try {
 writer?.write(x);
 } catch (e) {
 err.push(e);
 }
 cb && cb.apply(undefined, err);
 return true;
 }

I can't recall at the moment what DOM streams do if write() throws, but it's also worth considering the impact of this change to this line.

Trying to think through other implications of throwing here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@trxcllnt trxcllnt trxcllnt requested changes

Copilot code review Copilot Copilot left review comments

@domoritz domoritz Awaiting requested review from domoritz

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

RecordBatchStreamWriter.write() silently drops batches on schema mismatch

AltStyle によって変換されたページ (->オリジナル) /