-
-
Notifications
You must be signed in to change notification settings - Fork 213
v1.2.0 #313
-
What's Changed
- Feat/Support for running state of the art Nvidia Canary and Parakeet models by @rishikanthc in Feat/Support for running state of the art Nvidia Canary and Parakeet models #186
- adds support for video transcription - formatter run touched all files by @rishikanthc in adds support for video transcription - formatter run touched all files #187
- Fix/app performance by @rishikanthc in Fix/app performance #192
- fix summary settings tab not getting latest llm config status by @rishikanthc in fix summary settings tab not getting latest llm config status #193
- 181 feature multitrack support by @rishikanthc in 181 feature multitrack support #197
- fixes persistence of API keys by @rishikanthc in fixes persistence of API keys #198
- adds support for drag and drop by @rishikanthc in adds support for drag and drop #207
- fixes pyannote diarization with the new unified arch by @rishikanthc in fixes pyannote diarization with the new unified arch #210
- improves logging by @rishikanthc in improves logging #212
- fixes refresh causing stop job dialogue to close by @rishikanthc in fixes refresh causing stop job dialogue to close #213
- Adds cuda compatibility to Dockerfile.cuda via cuda runtime. by @dillydogg in Adds cuda compatibility to Dockerfile.cuda via cuda runtime. #214
- adds diarization parameters to info card by @rishikanthc in adds diarization parameters to info card #215
- Fix model storage persistence and Parakeet CUDA errors by @Ixian in Fix model storage persistence and Parakeet CUDA errors #260
- Fix YouTube downloads - Add Deno runtime for video cipher decryption by @Ixian in Fix YouTube downloads - Add Deno runtime for video cipher decryption #259
- Repair Bad File Paths For Windows by @ThePieMonster in Repair Bad File Paths For Windows #250
- Update README.md by @Pakkieressabereso in Update README.md #255
- Fixed the unit tests by @Ixian in Fixed the unit tests #262
- Fix MP4 file upload validation in audio upload handler by @Ixian in Fix MP4 file upload validation in audio upload handler #263
- Add MPS for Apple Silicon by @a-huk in Add MPS for Apple Silicon #264
- Fix CUDA Error 35 on Parakeet for short audio (<300s) by @Ixian in Fix CUDA Error 35 on Parakeet for short audio (<300s) #267
- Watch and auto ingest recordings from desktop - closes Allow local folder as upload/ingest #274 , closes Indexing a directory structure for audio files #129 by @rishikanthc in Watch and auto ingest recordings from desktop - closes #274, closes #129 #275
- Improve UI/UX by @rishikanthc in Improve UI/UX #276
- Docs/revamp v1.2 - closes [Feature request] Ability to scroll playback using timestamps from a transcription #177 by @rishikanthc in Docs/revamp v1.2 - closes #177 #278
- Implements webhooks to make automation easier via API by @rishikanthc in Implements webhooks to make automation easier via API #281
- frontend HMR dev loop by @paulirish in frontend HMR dev loop #280
- Revert "frontend HMR dev loop" by @rishikanthc in Revert "frontend HMR dev loop" #282
- Revert "frontend HMR dev loop" by @rishikanthc in Revert "frontend HMR dev loop" #283
- adds support for openai api compatible endpoints for transcription - closes [Feature Request] Add OpenAI API as a Transcription Profile #196 by @rishikanthc in adds support for openai api compatible endpoints for transcription - closes #196 #284
- Instead of full raw transcript json we send formated text to LLM by @EdrisT in Instead of full raw transcript json we send formated text to LLM #288
- fix: downgrade cuda base image and remove conflicting LD_LIBRARY_PATH - fixes Project is not working with Nvidia GTX Series 10 #273 , fixes Application does not work with RTX 5090 #104 , fixes WhisperX not working with GPU #246 by @rishikanthc in fix: downgrade cuda base image and remove conflicting LD_LIBRARY_PATH - fixes #273 , fixes #104 , fixes #246 #290
- Use usermappings in chat instead of generic diarization names by @EdrisT in Use usermappings in chat instead of generic diarization names #294
- Fixes for running transcription and diarization on older Nvidia cards - Resolves Project is not working with Nvidia GTX Series 10 #273 by @rishikanthc in Fixes for running transcription and diarization on older Nvidia cards - Resolves #273 #295
- Custom alignment model by @EdrisT in Custom alignment model #296
- Configurable OpenAI API Base URL by @EdrisT in Configurable OpenAI API Base URL #292
- Show all models when using custom OpenAI compatible endpoint in LLM by @EdrisT in Show all models when using custom OpenAI compatible endpoint in LLM #301
- chore: add the missing unzip dependency in the dockerfile (refs either unzip or 7z is required to install Deno #285 ) by @cbonnissent in chore: add the missing unzip dependency in the dockerfile (refs #285) #302
- UI refactor by @rishikanthc in UI refactor #307
- Doc redesign by @rishikanthc in Doc redesign #308
- Lint debt by @rishikanthc in Lint debt #309
- Go debt by @rishikanthc in Go debt #312
New Contributors
- @dillydogg made their first contribution in Adds cuda compatibility to Dockerfile.cuda via cuda runtime. #214
- @Ixian made their first contribution in Fix model storage persistence and Parakeet CUDA errors #260
- @ThePieMonster made their first contribution in Repair Bad File Paths For Windows #250
- @Pakkieressabereso made their first contribution in Update README.md #255
- @a-huk made their first contribution in Add MPS for Apple Silicon #264
- @paulirish made their first contribution in frontend HMR dev loop #280
- @EdrisT made their first contribution in Instead of full raw transcript json we send formated text to LLM #288
- @cbonnissent made their first contribution in chore: add the missing unzip dependency in the dockerfile (refs #285) #302
Full Changelog: v1.1.0...v1.2.0
This discussion was created from the release v1.2.0.
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 2 comments 10 replies
-
Great upgrade @rishikanthc ! I've yet to fully learn all the updates, but I'm especially excited to try the OpenAI Transcription option. I had a couple questions if that's ok.
Is the the limitation on timestamping with OpenAI a limitation on their end?
Is there still a way to assign a name to a Speaker?
Would it be possible to save the Hugging Face token for diarisation, or is there a reason it's not so?
Thanks again for the awesome upgrade!
Beta Was this translation helpful? Give feedback.
All reactions
-
@patienttruth interesting.. It is possible that the speaker information and timestamp information is adding to the context.. because with these, every individual speaker segment has extra timing and speaker information.. so for a 99min audio this could quickly explode.. I think one way I can address this is to provide an option for the user to select if they want to pass timestamp and speaker information to chat and summary.. do you think that would work ??
I think i also need to do more testing with a long audio. I'll probably grab something from youtube and test it out..
Regarding the whisper-1 error, this is an openai limitation.. Their transcription endpoint has a file size limit, i believe it's 25MB if i remember correctly, and the error message agrees with it.. So the only work around is to downsize or compress the audio using ffmpeg and then upload it.. You won't loose much accuracy.. Higher bitrate for speech doesn't make a tonne of difference as far as transcription is concerned..
EDIT: infact all transcription models internally convert the files to 16-bit WAV mono audio before transcribing.. so try again with a compressed file
Beta Was this translation helpful? Give feedback.
All reactions
-
@patienttruth I just realized that the option to rename speakers is missing xD Looks like I introduced a regression.. I'll push out a patch soon.. The backend already supports it. I just need to expose it in the frontend.. Apologies for the stupidity 😂
Beta Was this translation helpful? Give feedback.
All reactions
-
Hey, no worries!
Optional passing of the info would be nice.
I'm looking into ideas for how to handle processing a long transcript with the names and timestamps. From what I'm reading it seems like I may have to chunk the transcript. In my use case I can probably use voice markers like " next item" as a tag for something that would do the chunk. Then I would have to have the summarization done at a chunk level, and then bring it all together.
I don't know if you see functionality at this level as a part of Scriberr or not. I haven't really looked into the CLI or API options from Scriberr, but currently I'm hoping I can somehow create a pipe that handles these processes. Being that I'm a competent tinkerer, and not very competent at any scripting Scriberr has been awesome in making a lot of my workflow available via GUI.
ETA - I'll play around with the audio to see if I can get a long recording into whisper. I may just have to buy a GPU as I prefer local anyway.
Beta Was this translation helpful? Give feedback.
All reactions
-
So I do plan on introducing semantic search based on text embeddings (basically a RAG).. So it would not be too hard to implement something where, when you type a query in chat it will load into context only the chunks of transcript that are relevant to your particular query.. The con would be that your query has to be quite descriptive or else, it might not find the most relevant chunks..
but if you go down the API route, you can separate the concerns. Use Scriberr to get the transcript, then handle chat outside of scriberr.. There's no need to pass the entire transcript. So once you have the transcript (you can get this using Scriberr API), then you can manipulate the transcript (like chunking) however you see fit and then feed it to a chat endpoint of your choice..
Beta Was this translation helpful? Give feedback.
All reactions
-
Also depending on your transcription accuracy needs, and the capacity of your CPU, it might not be too bad to use CPU for transcription (especially the smaller models) if latency isn't an issue for you.. Meaning, if you don't need a transcript instantly and okay with waiting for some time, then you can simply use CPU transcription to let it sit and hash it out for a while.. The more cores / threads you have the faster it will be.. Macs are amazing for this.. Transcription on any of the M series Macs are quite fast from my testing. Of course a dedicated GPU would be way faster but it's definitely usable..
Beta Was this translation helpful? Give feedback.
All reactions
-
Loving this so far for class. One thing I've noticed is that when YouTube updated their systems, the download YouTube option no longer works. Even yt-dlp was down for a bit before there was a workaround. Will Scriberr be updated to fix the YouTube download option? I do use it quite frequently because our professor posts all lectures on YouTube and I can't use Scriberr to pull the audio anymore. I have to manually download the videos and then upload that to Scriberr which is just slightly annoying. If it can't be fixed or isn't in the plans, no worries. I can make due with downloading myself. Just wondering. Thanks in advance and thanks for such an awesome overall app.
Beta Was this translation helpful? Give feedback.