Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix: engine cannot give response to the second user request with stre... #701

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lihaiyin88 wants to merge 1 commit into mlc-ai:main
base: main
Choose a base branch
Loading
from lihaiyin88:main

Conversation

Copy link

@lihaiyin88 lihaiyin88 commented Jun 24, 2025
edited
Loading

see title

Copy link
Member

thanks a lot for the contribution. Would it be possible for you to provide a script for reproducing the issue / elaborate on the issue? Thank you!

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR restructures the streaming generation method in MLCEngine to ensure the model lock is always released once, fixing an issue where the engine could not handle a second streaming request.

  • Flattened multiple nested try/catch blocks into a single try + finally around the main logic.
  • Converted inner helper functions (_countTrailingReplacementChar, _getChunk) to arrow function expressions.
  • Removed redundant lock.release() calls spread across catches; now released once in finally.
Comments suppressed due to low confidence (2)

src/engine.ts:706

  • This TODO is still open—either implement the usage support for non-chat completions or file a tracking issue to ensure it isn't overlooked.
 // TODO(Charlie): support usage for completion

src/engine.ts:483

  • Add a test case that performs two consecutive streaming requests (with and without errors) to verify the lock is always released and reacquired correctly.
 genConfig: GenerationConfig,

// Since it is an async generator, we need to do fine-grained try-catch to ensure lock is
// released only when errors occur. Then release at the very end when no error occurs.
// TODO: This makes code less readable, is there a better way to do this?
const lock = this.loadedModelIdToLock.get(model)!;
Copy link

Copilot AI Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a brief comment above this try/finally to clarify its scope is for lock acquisition and release, improving future readability.

Suggested change
const lock = this.loadedModelIdToLock.get(model)!;
const lock = this.loadedModelIdToLock.get(model)!;
// Acquire the lock and ensure its release in the `finally` block.

Copilot uses AI. Check for mistakes.

Copy link
Author

thanks a lot for the contribution. Would it be possible for you to provide a script for reproducing the issue / elaborate on the issue? Thank you!

Hey there! So sorry for the super late reply! 😊
I've just pushed all the updates to the forked project right here: https://github.com/lihaiyin88/web-llm/tree/demo_for_issue
If you have any questions at all, don't hesitate to let me know! I'm all ears! 😎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

Copilot code review Copilot Copilot left review comments

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /