Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Fixed VAD to work when using whisper_full_with_state #3423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mdestagnol wants to merge 2 commits into ggml-org:master
base: master
Choose a base branch
Loading
from mdestagnol:vad-with-state-fix

Conversation

@mdestagnol
Copy link
Contributor

@mdestagnol mdestagnol commented Sep 17, 2025
edited
Loading

Currently whisper_full_with_state doesn't work with VAD.

The main inference method is whisper_full_with_state. This method takes a whisper_context and a whisper_state. On top of this method are built 2 convenience methods, whisper_full and whisper_full_parallel.

The VAD logic is currently only implemented in those convenience methods, but not in whisper_full_with_state itself. This is an issue whenever we need to deal with more than one state (when processing multiple real time streams for example).

This pull request moves the logic into whisper_full_with_state and fix a few incorrect calls to ctx->state instead of state directly.

HoKim98 reacted with eyes emoji
Copy link
Member

danbev commented Sep 21, 2025

@mdestagnol Thanks for the pull request.

I've been thinking about this and if I recall correctly, the original pr for VAD actually did something like this. I recall getting feedback about the state should be passed in and not be updated inside of this function (I've not been able to find the comment though I'm afraid).

So I think exposing whisper_vad might be a better option here.

Copy link
Contributor Author

@mdestagnol Thanks for the pull request.

I've been thinking about this and if I recall correctly, the original pr for VAD actually did something like this. I recall getting feedback about the state should be passed in and not be updated inside of this function (I've not been able to find the comment though I'm afraid).

So I think exposing whisper_vad might be a better option here.

Isn't state's purpose to persist the context of whisper_full? State is how the whisper_full_with_state method maintains its own context between different runs. It is actively modified by whisper_full_with_state at each run (even outside of VAD). What do you think?

Copy link
Contributor

Whatever route is taken, this should be documented anyways since it was extremely confusing wondering why VAD wasn't working despite having the switch on.

danbev reacted with thumbs up emoji

Copy link
Member

danbev commented Sep 27, 2025

What do you think?

Sorry, I make a mistake (recalled incorrectly) about the reason for moving the VAD processing out of whisper_full_with_state. Looking at it closer I found this was moved in 7497754 and related to when parallel processing in use (-p/--processors). We would have to take the parallel processing into account before moving this back into whisper_full_with_state or that parallel processing will not work correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@danbev danbev danbev left review comments

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /