1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

Whisper Inference

Asked 1 year, 11 months ago

Viewed 183 times

why transcribe stage we remove N_FRAMES from mel and in for loop over the mel_segment it didn't take the last segment if it's less than 3000 frame why? let's suppose that he mel = [80,4100] first mel segment will be [80,3000], and [80,1100] the model will transcribe the first segment [80,3000] and in this [80,1100] it will not do any thing


# Pad 30-seconds of silence to the input audio, for slicing
 mel = log_mel_spectrogram(audio, model.dims.n_mels, padding=N_SAMPLES)
 content_frames = mel.shape[-1] - N_FRAMES # N_FRAMES = 3000
 content_duration = float(content_frames * HOP_LENGTH / SAMPLE_RATE)

Improve this question

asked Feb 8, 2024 at 20:53

AbdElRhaman Fakhrygmailcom's user avatar

AbdElRhaman Fakhrygmailcom

394 bronze badges

Maybe because the model requires a minimum number of frames to generate accurate transcriptions or to ensure that there is sufficient context for the model to process. Possibly this helps to maintain the accuracy and quality of the transcriptions generated by the model. If the last segment were transcribed even though it's shorter, it might not provide enough context for the model to generate accurate transcriptions, leading to potentially less reliable results.

Milos Stojanovic
– Milos Stojanovic

2024年02月08日 21:13:43 +00:00
Commented Feb 8, 2024 at 21:13
You need to pad the last frames with silence.

anon
– anon

2024年04月08日 05:40:17 +00:00
Commented Apr 8, 2024 at 5:40

Add a comment |

0

Sorted by: Reset to default

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

CollectivesTM on Stack Overflow

Whisper Inference

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions