v1.2.0 · rishikanthc/Scriberr · Discussion #313

rishikanthc
Dec 17, 2025
Maintainer

What's Changed

Feat/Support for running state of the art Nvidia Canary and Parakeet models by @rishikanthc in Feat/Support for running state of the art Nvidia Canary and Parakeet models #186
adds support for video transcription - formatter run touched all files by @rishikanthc in adds support for video transcription - formatter run touched all files #187
Fix/app performance by @rishikanthc in Fix/app performance #192
fix summary settings tab not getting latest llm config status by @rishikanthc in fix summary settings tab not getting latest llm config status #193
181 feature multitrack support by @rishikanthc in 181 feature multitrack support #197
fixes persistence of API keys by @rishikanthc in fixes persistence of API keys #198
adds support for drag and drop by @rishikanthc in adds support for drag and drop #207
fixes pyannote diarization with the new unified arch by @rishikanthc in fixes pyannote diarization with the new unified arch #210
improves logging by @rishikanthc in improves logging #212
fixes refresh causing stop job dialogue to close by @rishikanthc in fixes refresh causing stop job dialogue to close #213
Adds cuda compatibility to Dockerfile.cuda via cuda runtime. by @dillydogg in Adds cuda compatibility to Dockerfile.cuda via cuda runtime. #214
adds diarization parameters to info card by @rishikanthc in adds diarization parameters to info card #215
Fix model storage persistence and Parakeet CUDA errors by @Ixian in Fix model storage persistence and Parakeet CUDA errors #260
Fix YouTube downloads - Add Deno runtime for video cipher decryption by @Ixian in Fix YouTube downloads - Add Deno runtime for video cipher decryption #259
Repair Bad File Paths For Windows by @ThePieMonster in Repair Bad File Paths For Windows #250
Update README.md by @Pakkieressabereso in Update README.md #255
Fixed the unit tests by @Ixian in Fixed the unit tests #262
Fix MP4 file upload validation in audio upload handler by @Ixian in Fix MP4 file upload validation in audio upload handler #263
Add MPS for Apple Silicon by @a-huk in Add MPS for Apple Silicon #264
Fix CUDA Error 35 on Parakeet for short audio (<300s) by @Ixian in Fix CUDA Error 35 on Parakeet for short audio (<300s) #267
Watch and auto ingest recordings from desktop - closes Allow local folder as upload/ingest #274 , closes Indexing a directory structure for audio files #129 by @rishikanthc in Watch and auto ingest recordings from desktop - closes #274, closes #129 #275
Improve UI/UX by @rishikanthc in Improve UI/UX #276
Docs/revamp v1.2 - closes [Feature request] Ability to scroll playback using timestamps from a transcription #177 by @rishikanthc in Docs/revamp v1.2 - closes #177 #278
Implements webhooks to make automation easier via API by @rishikanthc in Implements webhooks to make automation easier via API #281
frontend HMR dev loop by @paulirish in frontend HMR dev loop #280
Revert "frontend HMR dev loop" by @rishikanthc in Revert "frontend HMR dev loop" #282
Revert "frontend HMR dev loop" by @rishikanthc in Revert "frontend HMR dev loop" #283
adds support for openai api compatible endpoints for transcription - closes [Feature Request] Add OpenAI API as a Transcription Profile #196 by @rishikanthc in adds support for openai api compatible endpoints for transcription - closes #196 #284
Instead of full raw transcript json we send formated text to LLM by @EdrisT in Instead of full raw transcript json we send formated text to LLM #288
fix: downgrade cuda base image and remove conflicting LD_LIBRARY_PATH - fixes Project is not working with Nvidia GTX Series 10 #273 , fixes Application does not work with RTX 5090 #104 , fixes WhisperX not working with GPU #246 by @rishikanthc in fix: downgrade cuda base image and remove conflicting LD_LIBRARY_PATH - fixes #273 , fixes #104 , fixes #246 #290
Use usermappings in chat instead of generic diarization names by @EdrisT in Use usermappings in chat instead of generic diarization names #294
Fixes for running transcription and diarization on older Nvidia cards - Resolves Project is not working with Nvidia GTX Series 10 #273 by @rishikanthc in Fixes for running transcription and diarization on older Nvidia cards - Resolves #273 #295
Custom alignment model by @EdrisT in Custom alignment model #296
Configurable OpenAI API Base URL by @EdrisT in Configurable OpenAI API Base URL #292
Show all models when using custom OpenAI compatible endpoint in LLM by @EdrisT in Show all models when using custom OpenAI compatible endpoint in LLM #301
chore: add the missing unzip dependency in the dockerfile (refs either unzip or 7z is required to install Deno #285 ) by @cbonnissent in chore: add the missing unzip dependency in the dockerfile (refs #285) #302
UI refactor by @rishikanthc in UI refactor #307
Doc redesign by @rishikanthc in Doc redesign #308
Lint debt by @rishikanthc in Lint debt #309
Go debt by @rishikanthc in Go debt #312

New Contributors

@dillydogg made their first contribution in Adds cuda compatibility to Dockerfile.cuda via cuda runtime. #214
@Ixian made their first contribution in Fix model storage persistence and Parakeet CUDA errors #260
@ThePieMonster made their first contribution in Repair Bad File Paths For Windows #250
@Pakkieressabereso made their first contribution in Update README.md #255
@a-huk made their first contribution in Add MPS for Apple Silicon #264
@paulirish made their first contribution in frontend HMR dev loop #280
@EdrisT made their first contribution in Instead of full raw transcript json we send formated text to LLM #288
@cbonnissent made their first contribution in chore: add the missing unzip dependency in the dockerfile (refs #285) #302

Full Changelog: v1.1.0...v1.2.0

This discussion was created from the release v1.2.0.

Replies: 2 comments 10 replies

patienttruth
Dec 19, 2025

Great upgrade @rishikanthc ! I've yet to fully learn all the updates, but I'm especially excited to try the OpenAI Transcription option. I had a couple questions if that's ok.

Is the the limitation on timestamping with OpenAI a limitation on their end?
Is there still a way to assign a name to a Speaker?
Would it be possible to save the Hugging Face token for diarisation, or is there a reason it's not so?

Thanks again for the awesome upgrade!

10 replies

@rishikanthc

rishikanthc Dec 19, 2025
Maintainer Author

@patienttruth interesting.. It is possible that the speaker information and timestamp information is adding to the context.. because with these, every individual speaker segment has extra timing and speaker information.. so for a 99min audio this could quickly explode.. I think one way I can address this is to provide an option for the user to select if they want to pass timestamp and speaker information to chat and summary.. do you think that would work ??

I think i also need to do more testing with a long audio. I'll probably grab something from youtube and test it out..

Regarding the whisper-1 error, this is an openai limitation.. Their transcription endpoint has a file size limit, i believe it's 25MB if i remember correctly, and the error message agrees with it.. So the only work around is to downsize or compress the audio using ffmpeg and then upload it.. You won't loose much accuracy.. Higher bitrate for speech doesn't make a tonne of difference as far as transcription is concerned..

EDIT: infact all transcription models internally convert the files to 16-bit WAV mono audio before transcribing.. so try again with a compressed file

@rishikanthc

rishikanthc Dec 19, 2025
Maintainer Author

@patienttruth I just realized that the option to rename speakers is missing xD Looks like I introduced a regression.. I'll push out a patch soon.. The backend already supports it. I just need to expose it in the frontend.. Apologies for the stupidity 😂

@patienttruth

patienttruth Dec 19, 2025

Hey, no worries!

Optional passing of the info would be nice.

I'm looking into ideas for how to handle processing a long transcript with the names and timestamps. From what I'm reading it seems like I may have to chunk the transcript. In my use case I can probably use voice markers like " next item" as a tag for something that would do the chunk. Then I would have to have the summarization done at a chunk level, and then bring it all together.

I don't know if you see functionality at this level as a part of Scriberr or not. I haven't really looked into the CLI or API options from Scriberr, but currently I'm hoping I can somehow create a pipe that handles these processes. Being that I'm a competent tinkerer, and not very competent at any scripting Scriberr has been awesome in making a lot of my workflow available via GUI.

ETA - I'll play around with the audio to see if I can get a long recording into whisper. I may just have to buy a GPU as I prefer local anyway.

@rishikanthc

rishikanthc Dec 19, 2025
Maintainer Author

So I do plan on introducing semantic search based on text embeddings (basically a RAG).. So it would not be too hard to implement something where, when you type a query in chat it will load into context only the chunks of transcript that are relevant to your particular query.. The con would be that your query has to be quite descriptive or else, it might not find the most relevant chunks..
but if you go down the API route, you can separate the concerns. Use Scriberr to get the transcript, then handle chat outside of scriberr.. There's no need to pass the entire transcript. So once you have the transcript (you can get this using Scriberr API), then you can manipulate the transcript (like chunking) however you see fit and then feed it to a chat endpoint of your choice..

@rishikanthc

rishikanthc Dec 19, 2025
Maintainer Author

Also depending on your transcription accuracy needs, and the capacity of your CPU, it might not be too bad to use CPU for transcription (especially the smaller models) if latency isn't an issue for you.. Meaning, if you don't need a transcript instantly and okay with waiting for some time, then you can simply use CPU transcription to let it sit and hash it out for a while.. The more cores / threads you have the faster it will be.. Macs are amazing for this.. Transcription on any of the M series Macs are quite fast from my testing. Of course a dedicated GPU would be way faster but it's definitely usable..

ebuchmann521
Feb 19, 2026

Loving this so far for class. One thing I've noticed is that when YouTube updated their systems, the download YouTube option no longer works. Even yt-dlp was down for a bit before there was a workaround. Will Scriberr be updated to fix the YouTube download option? I do use it quite frequently because our professor posts all lectures on YouTube and I can't use Scriberr to pull the audio anymore. I have to manually download the videos and then upload that to Scriberr which is just slightly annoying. If it can't be fixed or isn't in the plans, no worries. I can make due with downloading myself. Just wondering. Thanks in advance and thanks for such an awesome overall app.

0 replies

Uh oh!

v1.2.0 #313

Uh oh!

rishikanthc Dec 17, 2025 Maintainer

What's Changed

New Contributors

Replies: 2 comments · 10 replies

Uh oh!

patienttruth Dec 19, 2025

Uh oh!

Uh oh!

rishikanthc Dec 19, 2025 Maintainer Author

Uh oh!

rishikanthc Dec 19, 2025 Maintainer Author

Uh oh!

Uh oh!

patienttruth Dec 19, 2025

Uh oh!

Uh oh!

rishikanthc Dec 19, 2025 Maintainer Author

Uh oh!

rishikanthc Dec 19, 2025 Maintainer Author

Uh oh!

ebuchmann521 Feb 19, 2026

rishikanthc
Dec 17, 2025
Maintainer

Replies: 2 comments 10 replies

patienttruth
Dec 19, 2025

rishikanthc Dec 19, 2025
Maintainer Author

rishikanthc Dec 19, 2025
Maintainer Author

rishikanthc Dec 19, 2025
Maintainer Author

rishikanthc Dec 19, 2025
Maintainer Author

ebuchmann521
Feb 19, 2026