Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix: preserve overwrite schema changes#5161

Open
LuciferYang wants to merge 3 commits into
lance-format:main from
LuciferYang:fix/write-lance-schema-evolution
Open

fix: preserve overwrite schema changes #5161
LuciferYang wants to merge 3 commits into
lance-format:main from
LuciferYang:fix/write-lance-schema-evolution

Conversation

@LuciferYang

@LuciferYang LuciferYang commented Jun 4, 2026

Copy link
Copy Markdown

Summary

  • pass the write mode through fragment writes so overwrite/create fragments use the new schema instead of append validation
  • propagate mode through LanceFragmentWriter and streaming writes, with later streaming batches switching back to append
  • add regressions for URI, streaming, and namespace overwrite schema changes while preserving append behavior

Testing

  • python -m ruff check .
  • python -m ruff format --check lance_ray/fragment.py lance_ray/datasink.py lance_ray/io.py tests/test_basic_read_write.py
  • python -m pytest tests/test_basic_read_write.py tests/test_fragment.py tests/test_blob.py::test_stream_copy_basic_local tests/test_blob.py::test_stream_copy_resume_local tests/test_blob_v2.py::test_blob_v2_append_with_target_bases_stream -q
  • local write_lance failed after add a new column to existed dataset. #95 namespace overwrite smoke

Closes #95

@github-actions github-actions Bot added the bug Something isn't working label Jun 4, 2026
Distributed create/overwrite writes assign Lance field ids positionally, so
blocks whose columns arrive in a different order (e.g. a union of differently
ordered sources) were committed under a single schema and read back transposed.
Add a guard in LanceDatasink.on_write_complete that raises on inconsistent
column order, and align Arrow blocks by name in pd_to_arrow so an explicit
schema writes correctly.
Also harden the schema-evolution mode plumbing: document per-mode schema
behaviour (append validates/drops; create and overwrite evolve the schema),
fix the LanceDatasink default-mode docstring, document the write_fragment mode
parameter and the writer/committer mode-pairing requirement, and add regression
tests for multi-fragment overwrite, schema drop/type-change, streaming overwrite
resume, streaming-append rejection, and the column-order guard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

bug Something isn't working

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

write_lance failed after add a new column to existed dataset.

1 participant

AltStyle によって変換されたページ (->オリジナル) /