Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Video DB Scene Index with LLM. #26

Unanswered
dineshkumar181094 asked this question in Q&A
Discussion options

Hi VideoDB Team,

I was following up on the example https://docs.videodb.io/adding-ai-generated-voiceovers-with-videodb-and-lovo-70

Here are the few questions that are unclear from the document.

  1. indexing timeline and scene description + llm response.
  • I see the shot-based indexing created a 85 scenes out of 2.3 minutes of video. But while providing promt to llm you have done it single prompt and the response I got by following the doc has only 41 shots.
  • Why don't we iterate over each scene and ask llm to generate description to just fill that the timeline.
  • How we are sure that reponse given by the llm just fill the entire timeline of the video. It would be great if you can provide explaination of this.
You must be logged in to vote

Replies: 2 comments

Comment options

Just pointing out one more thing if there is slight movement in audio with scene it could create a whole different meaning. by shifiting the position.

You must be logged in to vote
0 replies
Comment options

Hi @dineshkumar181094 great observations!

  1. Could the LLM be stopping due to token limit? That might be one of the reason, as there is nothing in the prompt that is instructing LLM to restrict / stop. Ideally it should cover the whole input (85 scenes in your case).
  2. In shot based indexing, where the scene duration is very short (1-2 seconds) the output might not sound coherent, here for better precision maybe better way would be to club on certain threshold (x minutes) and generate audio for those clubbed chunks instead of char based chunks given in the tutorial.
  3. In our experimentation prompt generate a synced script based on the description writes a script with sentences which are roughly the same length as the time stamp of the scene in the description.

Just pointing out one more thing if there is slight movement in audio with scene it could create a whole different meaning. by shifiting the position. - Can you please share some example of this if handy? Probably good chunking should smooth out the cases like this.

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /