streaming http server · ggml-org/whisper.cpp · Discussion #3306

glaszig
Jul 5, 2025

hi folks. for a week i've been dabbling with cpp for the first time since school because i wanted to make the server in the examples capable of streaming transcribed segments instead of having to wait for the entire thing. i wanted to leverage the new_segment_callback and got quite far but it seems i cannot figure out some pointer and/or memory management-related stuff.

i wrote a transcriber class which launches in a detached thread. it gets added to the new_segment_callback_user_data to be able to send a segment back to the transcriber which adds it to it's std::queue and then takes it off the queue and writes it to httplib's DataSink instance which you get by using httplib::Response::set_chunked_content_provider().

it works for the first segment but crashes during the second:

# build
cmake --build build --target whisper-server
# run
lldb build/bin/whisper-server -- --model ggml-medium-q5_0.bin -l auto -pr -pc
# upload mp3
curl 127.0.0.1:8080/inference -F file="@call.mp3" -F response_format=json -v
[00:00:00.000 --> 00:00:15.000] [Ringtone]

crash:

whisper server listening at http://127.0.0.1:8080
Received request: call.mp3
Successfully loaded call.mp3
system_info: n_threads = 4 / 8 | WHISPER : COREML = 0 | OPENVINO = 0 | Metal : EMBED_LIBRARY = 1 | CPU : NEON = 1 | ARM_FMA = 1 | FP16_VA = 1 | DOTPROD = 1 | ACCELERATE = 1 | REPACK = 1 |
operator(): processing 'call.mp3' (2476800 samples, 154.8 sec), 4 threads, 1 processors, lang = auto, task = transcribe, timestamps = 1 ...
Running whisper.cpp inference on call.mp3
set_chunked_content_provider
Transcriber::stream
Transcriber::getNextData
whisper_full_with_state: auto-detected language: de (p = 0.989747)
Transcriber::handleSegment
stream_new_segment 1
stream_new_segment 2
stream: [00:00:00.000 --> 00:00:15.000] [Ringtone]
Transcriber::publishSegment
Transcriber::handleSegment
stream_new_segment 1
stream_new_segment 2
stream: [00:00:15.000 --> 00:00:17.000] Hello?
Transcriber::publishSegment
Process 43124 stopped
* thread #13, stop reason = EXC_BAD_ACCESS (code=1, address=0x2e6a0001aee12bcc)
 frame #0: 0x00000001aeeba7cc libsystem_pthread.dylib`pthread_mutex_lock + 12
libsystem_pthread.dylib`pthread_mutex_lock:
-> 0x1aeeba7cc <+12>: ldr x8, [x0]
 0x1aeeba7d0 <+16>: mov w9, #0x545a
 0x1aeeba7d4 <+20>: movk w9, #0x4d55, lsl #16
 0x1aeeba7d8 <+24>: cmp x8, x9
Target 0: (whisper-server) stopped.
(lldb)

i can't figure out what's wrong with the mutex. must be some threading/scoping issue. maybe somebody with actual experience wants to help me out...

code: https://github.com/glaszig/whisper.cpp/tree/server-streaming
diff: https://github.com/ggml-org/whisper.cpp/compare/master...glaszig:server-streaming?expand=1

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

streaming http server #3306

Uh oh!

{{title}}

Uh oh!

glaszig
Jul 5, 2025

Replies: 0 comments

Select a reply

Uh oh!

streaming http server #3306

Uh oh!

glaszig Jul 5, 2025

Replies: 0 comments

glaszig
Jul 5, 2025