Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat(realtime): Add audio conversations #6245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
richiejp wants to merge 3 commits into mudler:master
base: master
Choose a base branch
Loading
from richiejp:feat/realtime-audio-conv

Conversation

@richiejp
Copy link
Collaborator

@richiejp richiejp commented Sep 10, 2025
edited
Loading

Description

Add enough realtime API features to allow talking with an LLM using only audio.

Presently the realtime API only supports transcription which is a minor use-case for it. This PR should allow it to be used with a basic voice assistant.

This PR will ignore many of the options and edge-cases. Instead it'll just, for e.g., rely on server side VAD to commit conversation items.

Notes for Reviewers

  • Configure a model pipeline or use a multi-modal model.
  • Commit client audio to the conversation
  • Generate a text response (optional)
  • Generate an audio response
  • Interrupt generation on voice detection?

Fixes: #3714 (but we'll need follow issues)

Signed commits

  • Yes, I signed my commits.

mudler reacted with heart emoji
Copy link

netlify bot commented Sep 10, 2025
edited
Loading

Deploy Preview for localai ready!

Name Link
🔨 Latest commit c1b9f23
🔍 Latest deploy log https://app.netlify.com/projects/localai/deploys/68d03d019818140008d2dae5
😎 Deploy Preview https://deploy-preview-6245--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Collaborator Author

It's not clear to me if we have audio support in llama.cpp: ggml-org/llama.cpp#15194

Copy link
Collaborator Author

Copy link
Collaborator Author

Copy link
Owner

mudler commented Sep 21, 2025

my initial thought on this was to use the whisper backend for transcribing from VAD, and give the text to a text-to-text backend, this way we can always go back at this. There was also an interface created exactly for this so a pipeline can be kinda seen as a "drag and drop" until omni models are really capable.

However, yes audio input is actually supported by llama.cpp and our backends, try qwen2-omni, you will be able to give it an audio as input, but isn't super accurate (better transcribing for now).

richiejp reacted with thumbs up emoji

Copy link
Collaborator Author

OK, I tried Qwen 2 omni and had issues with accuracy and context length which aren't a problem for a pipeline.

mudler reacted with thumbs up emoji

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Add support for realtime API

2 participants

AltStyle によって変換されたページ (->オリジナル) /