feat(realtime): Add audio conversations #6245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

richiejp wants to merge 3 commits into mudler:master

from richiejp:feat/realtime-audio-conv

Draft

feat(realtime): Add audio conversations #6245

richiejp wants to merge 3 commits into mudler:master from richiejp:feat/realtime-audio-conv

Conversation

@richiejp

Copy link

Collaborator

@richiejp richiejp commented Sep 10, 2025 •

edited

Loading

Description

Add enough realtime API features to allow talking with an LLM using only audio.

Presently the realtime API only supports transcription which is a minor use-case for it. This PR should allow it to be used with a basic voice assistant.

This PR will ignore many of the options and edge-cases. Instead it'll just, for e.g., rely on server side VAD to commit conversation items.

Notes for Reviewers

Configure a model pipeline or use a multi-modal model.
Commit client audio to the conversation
Generate a text response (optional)
Generate an audio response
Interrupt generation on voice detection?

Fixes: #3714 (but we'll need follow issues)

Signed commits

Yes, I signed my commits.

@netlify

Copy link

netlify bot commented Sep 10, 2025 •

edited

Loading

✅ Deploy Preview for localai ready!

Name	Link
🔨 Latest commit	`c1b9f23`
🔍 Latest deploy log	https://app.netlify.com/projects/localai/deploys/68d03d019818140008d2dae5
😎 Deploy Preview	https://deploy-preview-6245--localai.netlify.app
📱 Preview on mobile	Toggle QR Code... QR Code Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@mudler mudler added the roadmap label

Sep 11, 2025

@richiejp

Copy link

Collaborator Author

richiejp commented Sep 13, 2025

It's not clear to me if we have audio support in llama.cpp: ggml-org/llama.cpp#15194

@richiejp

Copy link

Collaborator Author

richiejp commented Sep 13, 2025

ggml-org/llama.cpp#13759

@richiejp

Copy link

Collaborator Author

richiejp commented Sep 13, 2025

ggml-org/llama.cpp#13784

richiejp added 2 commits

September 17, 2025 14:16

@richiejp


 feat(realtime): Add audio conversations

f1df1c4

Signed-off-by: Richard Palethorpe <io@richiejp.com>

@richiejp


 fixup realtime

e8052f3

@mudler

Copy link

Owner

mudler commented Sep 21, 2025

my initial thought on this was to use the whisper backend for transcribing from VAD, and give the text to a text-to-text backend, this way we can always go back at this. There was also an interface created exactly for this so a pipeline can be kinda seen as a "drag and drop" until omni models are really capable.

However, yes audio input is actually supported by llama.cpp and our backends, try qwen2-omni, you will be able to give it an audio as input, but isn't super accurate (better transcribing for now).

@richiejp

Copy link

Collaborator Author

richiejp commented Sep 21, 2025

OK, I tried Qwen 2 omni and had issues with accuracy and context length which aren't a problem for a pipeline.

@richiejp


 fixup realtime

c1b9f23

@richiejp richiejp force-pushed the feat/realtime-audio-conv branch from 2eae0d9 to c1b9f23 Compare

September 21, 2025 17:59

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(realtime): Add audio conversations #6245

Are you sure you want to change the base?

feat(realtime): Add audio conversations #6245

Uh oh!

Conversation

@richiejp richiejp commented Sep 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

netlify bot commented Sep 10, 2025 •

edited

Loading

Uh oh!

✅ Deploy Preview for localai ready!

Uh oh!

richiejp commented Sep 13, 2025

Uh oh!

richiejp commented Sep 13, 2025

Uh oh!

richiejp commented Sep 13, 2025

Uh oh!

mudler commented Sep 21, 2025

Uh oh!

richiejp commented Sep 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat(realtime): Add audio conversations #6245

Are you sure you want to change the base?

feat(realtime): Add audio conversations #6245

Uh oh!

Conversation

@richiejp richiejp commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for localai ready!

Uh oh!

richiejp commented Sep 13, 2025

Uh oh!

richiejp commented Sep 13, 2025

Uh oh!

richiejp commented Sep 13, 2025

Uh oh!

mudler commented Sep 21, 2025

Uh oh!

richiejp commented Sep 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

@richiejp richiejp commented Sep 10, 2025 •

edited

Loading

netlify bot commented Sep 10, 2025 •

edited

Loading