Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

v1.2.0 #313

Dec 17, 2025 · 2 comments · 10 replies
Discussion options

What's Changed

New Contributors

Full Changelog: v1.1.0...v1.2.0


This discussion was created from the release v1.2.0.
You must be logged in to vote

Replies: 2 comments 10 replies

Comment options

Great upgrade @rishikanthc ! I've yet to fully learn all the updates, but I'm especially excited to try the OpenAI Transcription option. I had a couple questions if that's ok.

Is the the limitation on timestamping with OpenAI a limitation on their end?
Is there still a way to assign a name to a Speaker?
Would it be possible to save the Hugging Face token for diarisation, or is there a reason it's not so?

Thanks again for the awesome upgrade!

You must be logged in to vote
10 replies
Comment options

rishikanthc Dec 19, 2025
Maintainer Author

@patienttruth interesting.. It is possible that the speaker information and timestamp information is adding to the context.. because with these, every individual speaker segment has extra timing and speaker information.. so for a 99min audio this could quickly explode.. I think one way I can address this is to provide an option for the user to select if they want to pass timestamp and speaker information to chat and summary.. do you think that would work ??

I think i also need to do more testing with a long audio. I'll probably grab something from youtube and test it out..

Regarding the whisper-1 error, this is an openai limitation.. Their transcription endpoint has a file size limit, i believe it's 25MB if i remember correctly, and the error message agrees with it.. So the only work around is to downsize or compress the audio using ffmpeg and then upload it.. You won't loose much accuracy.. Higher bitrate for speech doesn't make a tonne of difference as far as transcription is concerned..

EDIT: infact all transcription models internally convert the files to 16-bit WAV mono audio before transcribing.. so try again with a compressed file

Comment options

rishikanthc Dec 19, 2025
Maintainer Author

@patienttruth I just realized that the option to rename speakers is missing xD Looks like I introduced a regression.. I'll push out a patch soon.. The backend already supports it. I just need to expose it in the frontend.. Apologies for the stupidity 😂

Comment options

Hey, no worries!

Optional passing of the info would be nice.

I'm looking into ideas for how to handle processing a long transcript with the names and timestamps. From what I'm reading it seems like I may have to chunk the transcript. In my use case I can probably use voice markers like " next item" as a tag for something that would do the chunk. Then I would have to have the summarization done at a chunk level, and then bring it all together.

I don't know if you see functionality at this level as a part of Scriberr or not. I haven't really looked into the CLI or API options from Scriberr, but currently I'm hoping I can somehow create a pipe that handles these processes. Being that I'm a competent tinkerer, and not very competent at any scripting Scriberr has been awesome in making a lot of my workflow available via GUI.

ETA - I'll play around with the audio to see if I can get a long recording into whisper. I may just have to buy a GPU as I prefer local anyway.

Comment options

rishikanthc Dec 19, 2025
Maintainer Author

So I do plan on introducing semantic search based on text embeddings (basically a RAG).. So it would not be too hard to implement something where, when you type a query in chat it will load into context only the chunks of transcript that are relevant to your particular query.. The con would be that your query has to be quite descriptive or else, it might not find the most relevant chunks..
but if you go down the API route, you can separate the concerns. Use Scriberr to get the transcript, then handle chat outside of scriberr.. There's no need to pass the entire transcript. So once you have the transcript (you can get this using Scriberr API), then you can manipulate the transcript (like chunking) however you see fit and then feed it to a chat endpoint of your choice..

Comment options

rishikanthc Dec 19, 2025
Maintainer Author

Also depending on your transcription accuracy needs, and the capacity of your CPU, it might not be too bad to use CPU for transcription (especially the smaller models) if latency isn't an issue for you.. Meaning, if you don't need a transcript instantly and okay with waiting for some time, then you can simply use CPU transcription to let it sit and hash it out for a while.. The more cores / threads you have the faster it will be.. Macs are amazing for this.. Transcription on any of the M series Macs are quite fast from my testing. Of course a dedicated GPU would be way faster but it's definitely usable..

Comment options

Loving this so far for class. One thing I've noticed is that when YouTube updated their systems, the download YouTube option no longer works. Even yt-dlp was down for a bit before there was a workaround. Will Scriberr be updated to fix the YouTube download option? I do use it quite frequently because our professor posts all lectures on YouTube and I can't use Scriberr to pull the audio anymore. I have to manually download the videos and then upload that to Scriberr which is just slightly annoying. If it can't be fixed or isn't in the plans, no worries. I can make due with downloading myself. Just wondering. Thanks in advance and thanks for such an awesome overall app.

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

AltStyle によって変換されたページ (->オリジナル) /