Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-116738: Use PyMutex for lzma module #140711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
yoney wants to merge 1 commit into python:main
base: main
Choose a base branch
Loading
from yoney:ft_lzma
Open

Conversation

@yoney
Copy link
Contributor

@yoney yoney commented Oct 28, 2025
edited
Loading

Similar to #140555, the main goal was to review the lzma module for free-threading. The methods already use a lock, which makes them thread-safe in a free-threaded build. I replaced PyThread_acquire_lock with PyMutex. PyMutex releases the GIL when the thread is parked. This change removes some macros and allocation handling code.

cc: @mpage @colesbury @emmatyping

def worker():
# it should return empty bytes as it buffers data internally
data = lzc.compress(INPUT)
self.assertEqual(data, b"")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion self.assertEqual(data, b"") is flaky. In free-threaded mode, compress() may return data chunks non-deterministically due to race conditions in internal buffering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashm-dev Thanks for your comment. I’m trying to verify/test the mutex is protecting the internal state and buffering, so there shouldn’t be a race condition. Could you please explain which race condition you mean? That would help me understand your point better.

Copy link
Member

@ZeroIntensity ZeroIntensity Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashm-dev Are you using ChatGPT or another LLM to review for you? If so, please don't -- it's not helpful. If not, please try to be clearer in your responses.

yoney reacted with thumbs up emoji
def worker():
data = lzd.decompress(compressed, chunk_size)
self.assertEqual(len(data), chunk_size)
output.append(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output.append(data) without synchronization causes race conditions in free-threaded mode, potentially losing data or corrupting the list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output.append(data) without synchronization causes race conditions in free-threaded mode, potentially losing data or corrupting the list.

@ashm-dev list is thread safe in free-threaded build.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the free-threaded build, list operations use internal locks to avoid crashes, but thread safety isn’t guaranteed for concurrent mutations — see Python free-threading HOWTO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there may be some misunderstanding, could you please check the list test code below?

def test_racing_iter_append(self):
l = []
barrier = Barrier(NTHREAD + 1)
def writer_func(l):
barrier.wait()
for i in range(OBJECT_COUNT):
l.append(C(i + OBJECT_COUNT))

@ashm-dev I’ve tried to address all your comments, but some are still unclear to me. Could you please clarify or resolve them? Thank you!


def worker():
data = lzd.decompress(compressed, chunk_size)
self.assertEqual(len(data), chunk_size)
Copy link
Contributor

@ashm-dev ashm-dev Oct 28, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.assertEqual(len(data), chunk_size) is wrong. decompress() may return less than max_length bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashm-dev I agree that decompress() can return less than max_length if there isn’t enough input. In this test, I’m providing input that should produce at least max_length bytes. Is there anything else I might be missing? If I give enough valid input, is there any reason why lzma wouldn’t return max_length?

There are other tests making similar assumptions.

# Feed first half the input
len_ = len(COMPRESSED_XZ) // 2
out.append(lzd.decompress(COMPRESSED_XZ[:len_],
max_length=max_length))
self.assertFalse(lzd.needs_input)
self.assertEqual(len(out[-1]), max_length)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@ZeroIntensity ZeroIntensity ZeroIntensity left review comments

@colesbury colesbury Awaiting requested review from colesbury

@emmatyping emmatyping Awaiting requested review from emmatyping

@mpage mpage Awaiting requested review from mpage

+1 more reviewer

@ashm-dev ashm-dev ashm-dev left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /