PyPI version Stargazers Issues Website
Video Database for your AI Applications
Explore the docs »
View Demo
·
Report Bug
·
Request Feature
VideoDB Python SDK provides programmatic access to VideoDB's serverless video infrastructure. Build AI applications that understand and process video as structured data with support for semantic search, scene extraction, transcript generation, and multimodal content generation.
- Installation
- Quick Start
- Working with Collections
- Advanced Features
- Configuration Options
- Error Handling
- API Reference
- Examples and Tutorials
- Contributing
- Resources
- License
pip install videodb
Requirements:
- Python 3.8 or higher
- Dependencies:
requests>=2.25.1,backoff>=2.2.1,tqdm>=4.66.1
Get your API key from VideoDB Console. Free for first 50 uploads (no credit card required).
import videodb # Connect using API key conn = videodb.connect(api_key="YOUR_API_KEY") # Or set environment variable VIDEO_DB_API_KEY # conn = videodb.connect()
Upload videos, audio files, or images from various sources:
# Upload video from YouTube URL video = conn.upload(url="https://www.youtube.com/watch?v=VIDEO_ID") # Upload from public URL video = conn.upload(url="https://example.com/video.mp4") # Upload from local file video = conn.upload(file_path="./my_video.mp4") # Upload with metadata video = conn.upload( file_path="./video.mp4", name="My Video", description="Video description" )
The upload() method returns Video, Audio, or Image objects based on the media type.
# Update video name video.update(name="New Video Title")
# Generate stream URL stream_url = video.generate_stream() # Play stream using VideoDB player videodb.play_stream(stream_url) # Play in browser/notebook video.play()
Index and search video content semantically:
from videodb import SearchType, IndexType # Index spoken words for semantic search video.index_spoken_words() # Search for content results = video.search("morning sunlight") # Access search results shots = results.get_shots() for shot in shots: print(f"Found at {shot.start}s - {shot.end}s: {shot.text}") # Sort results by timestamp instead of relevance score results = coll.search(query="morning sunlight", sort_docs_on="start") # Play compiled results results.play()
Search Types:
SearchType.semantic- Semantic search (default)SearchType.keyword- Keyword-based searchSearchType.scene- Visual scene search
# Generate transcript video.generate_transcript() # Generate transcript with language hint video.generate_transcript(language_code="en") # Get transcript with timestamps transcript = video.get_transcript() # Get plain text transcript text = video.get_transcript_text() # Get transcript for specific time range transcript = video.get_transcript(start=10, end=60) # Translate transcript translated = video.translate_transcript( language="Spanish", additional_notes="Formal tone" )
Segmentation Options:
videodb.Segmenter.word- Word-level timestampsvideodb.Segmenter.sentence- Sentence-level timestampsvideodb.Segmenter.time- Time-based segments
Extract and analyze scenes from videos:
from videodb import SceneExtractionType # Extract scenes using shot detection scene_collection = video.extract_scenes( extraction_type=SceneExtractionType.shot_based, extraction_config={"threshold": 20, "frame_count": 1} ) # Extract scenes at time intervals scene_collection = video.extract_scenes( extraction_type=SceneExtractionType.time_based, extraction_config={ "time": 10, "frame_count": 1, "select_frames": ["first"] } ) # Describe individual scenes with custom model config scenes = video.get_scene_index(scene_collection.scene_index_id) scene = scenes[0] scene.describe( prompt="Describe this scene", model_config={"model_name": "pro", "temperature": 0.5} ) # Index scenes for semantic search scene_index_id = video.index_scenes( extraction_type=SceneExtractionType.shot_based, prompt="Describe the visual content of this scene" ) # Search within scenes results = video.search( query="outdoor landscape", search_type=SearchType.scene, index_type=IndexType.scene ) # List scene indexes scene_indexes = video.list_scene_index() # Get specific scene index scenes = video.get_scene_index(scene_index_id) # Delete scene collection video.delete_scene_collection(scene_collection.id)
from videodb import SubtitleStyle # Add subtitles with default style stream_url = video.add_subtitle() # Customize subtitle appearance style = SubtitleStyle( font_name="Arial", font_size=24, primary_colour="&H00FFFFFF", bold=True ) stream_url = video.add_subtitle(style=style)
# Get default thumbnail thumbnail_url = video.generate_thumbnail() # Generate thumbnail at specific timestamp thumbnail_image = video.generate_thumbnail(time=30.5) # Get all thumbnails thumbnails = video.get_thumbnails()
Organize and search across multiple videos:
# Get default collection coll = conn.get_collection() # Create new collection coll = conn.create_collection( name="My Collection", description="Collection description", is_public=False ) # List all collections collections = conn.get_collections() # Update collection coll = conn.update_collection( id="collection_id", name="Updated Name", description="Updated description" ) # Upload to collection video = coll.upload(url="https://example.com/video.mp4") # Get videos in collection videos = coll.get_videos() video = coll.get_video(video_id) # Search across collection results = coll.search(query="specific content") # Search by title results = coll.search_title("video title") # Make collection public/private coll.make_public() coll.make_private() # Delete collection coll.delete()
# Get audio files audios = coll.get_audios() audio = coll.get_audio(audio_id) # Generate audio URL audio_url = audio.generate_url() # Get images images = coll.get_images() image = coll.get_image(image_id) # Generate image URL image_url = image.generate_url() # Delete media audio.delete() image.delete()
Build multi-track video compositions programmatically using VideoDB's 4-layer architecture: Assets (raw media), Clips (how assets appear), Tracks (timeline lanes), and Timeline (final canvas).
Example: Video with background music
from videodb import connect from videodb.editor import Timeline, Track, Clip, VideoAsset, AudioAsset conn = connect(api_key="YOUR_API_KEY") video = conn.upload(url="https://www.youtube.com/watch?v=VIDEO_ID") audio = conn.upload(file_path="./music.mp3") # Create timeline timeline = Timeline(conn) # Video track video_track = Track() video_asset = VideoAsset(id=video.id, start=10) video_clip = Clip(asset=video_asset, duration=30) video_track.add_clip(0, video_clip) # Audio track audio_track = Track() audio_asset = AudioAsset(id=audio.id, start=0, volume=0.3) audio_clip = Clip(asset=audio_asset, duration=30) audio_track.add_clip(0, audio_clip) # Compose and render timeline.add_track(video_track) timeline.add_track(audio_track) stream_url = timeline.generate_stream()
Asset Types:
VideoAsset- Video clips with trim control (start,volume)AudioAsset- Background music, voiceovers, sound effectsImageAsset- Logos, watermarks, static overlaysTextAsset- Custom text with typography (Font,Background,Alignment)CaptionAsset- Auto-generated subtitles synced to speech
Clip Controls:
- Position & Scale:
position=Position.topRight,scale=0.5,offset=Offset(x=0.1, y=-0.2) - Visual Effects:
opacity=0.8,fit=Fit.cover,filter=Filter.greyscale - Transitions:
transition=Transition(in_="fade", out="fade", duration=1)
Track Layering:
- Clips on the same track play sequentially
- Clips on different tracks at the same time play simultaneously (overlays)
For advanced patterns (picture-in-picture, multi-audio layers, auto-captions), see the Editor SDK documentation.
Process live video streams in real-time:
from videodb import SceneExtractionType # Connect to real-time stream rtstream = coll.connect_rtstream( url="rtsp://example.com/stream", name="Live Stream" ) # Start or Stop processing rtstream.stop() rtstream.start() # Index scenes from stream scene_index = rtstream.index_scenes( extraction_type=SceneExtractionType.time_based, extraction_config={"time": 2, "frame_count": 5}, prompt="Describe the scene" ) # Start or Stop scene indexing scene_index.stop() scene_index.start() # Get scenes scenes = scene_index.get_scenes(page=1, page_size=100) # Create alerts for events alert_id = scene_index.create_alert( event_id=event_id, callback_url="https://example.com/callback" ) # Enable/disable alerts scene_index.disable_alert(alert_id) scene_index.enable_alert(alert_id) # Generate stream with player metadata stream_url = rtstream.generate_stream( start=1711000000, end=1711003600, player_config={ "title": "Live Feed", "description": "Stream recording", "slug": "live-feed" } ) # Export a stopped stream as a video/audio asset rtstream.stop() export_result = rtstream.export(name="my_recording") # List streams streams = coll.list_rtstreams()
Record screen, microphone, and system audio from desktop applications using native capture binaries:
# Install capture dependencies pip install 'videodb[capture]'
from videodb.capture import CaptureClient # Backend: Create a capture session cap = coll.create_capture_session( end_user_id="user_abc", callback_url="https://example.com/webhook" ) # Generate a client token for secure desktop auth token = conn.generate_client_token(expires_in=86400) # Desktop client: Start capture client = CaptureClient(session_token=token) # Request permissions await client.request_permission("microphone") await client.request_permission("screen") # Configure channels and start recording await client.start_capture_session( session_id=cap.id, channels=[ {"type": "mic", "name": "mic:default"}, {"type": "system_audio", "name": "system_audio:default"}, {"type": "display", "name": "display:1"}, ] ) # Stop capture await client.stop_capture_session() # Get session details and export cap = coll.get_capture_session(cap.id) export_result = cap.export() # List all capture sessions sessions = coll.list_capture_sessions()
Receive real-time transcript and indexing events via WebSocket:
# Connect to WebSocket ws = conn.connect_websocket() await ws.connect() print(f"Connection ID: {ws.connection_id}") # Stream events async for event in ws.receive(): print(event) # Close connection await ws.close()
Record and process virtual meetings:
# Start meeting recording meeting = conn.record_meeting( meeting_url="https://meet.google.com/xxx-yyyy-zzz", bot_name="Recorder Bot", meeting_title="Team Meeting", callback_url="https://example.com/callback" ) # Check meeting status meeting.refresh() print(meeting.status) # initializing, processing, or done # Wait for completion meeting.wait_for_status("done", timeout=14400, interval=120) # Get meeting details if meeting.is_completed: video_id = meeting.video_id video = coll.get_video(video_id) # Get meeting from video meeting_info = video.get_meeting()
Generate images, audio, and videos using AI:
# Generate image image = coll.generate_image( prompt="A beautiful sunset over mountains", aspect_ratio="16:9" ) # Generate music audio = coll.generate_music( prompt="Upbeat electronic music", duration=30 ) # Generate sound effects audio = coll.generate_sound_effect( prompt="Door closing sound", duration=2 ) # Generate voice from text audio = coll.generate_voice( text="Hello, welcome to VideoDB", voice_name="Default" ) # Generate video video = coll.generate_video( prompt="A cat playing with a ball", duration=5 ) # Generate text using LLM response = coll.generate_text( prompt="Summarize this content", model_name="pro", # basic, pro, or ultra response_type="text" # text or json )
# Dub video to another language dubbed_video = coll.dub_video( video_id=video.id, language_code="es", callback_url="https://example.com/callback" )
from videodb import TranscodeMode, VideoConfig, AudioConfig # Start transcoding job job_id = conn.transcode( source="https://example.com/video.mp4", callback_url="https://example.com/callback", mode=TranscodeMode.economy, video_config=VideoConfig(resolution=1080, quality=23), audio_config=AudioConfig(mute=False) ) # Check transcode status status = conn.get_transcode_details(job_id)
# Search YouTube results = conn.youtube_search( query="machine learning tutorial", result_threshold=10, duration="medium" ) for result in results: print(result["title"], result["url"])
# Check usage usage = conn.check_usage() # Get invoices invoices = conn.get_invoices()
# Download compiled stream download_info = conn.download( stream_link="https://stream.videodb.io/...", name="my_compilation" )
from videodb import SubtitleStyle, SubtitleAlignment, SubtitleBorderStyle style = SubtitleStyle( font_name="Arial", font_size=18, primary_colour="&H00FFFFFF", # White secondary_colour="&H000000FF", # Blue outline_colour="&H00000000", # Black back_colour="&H00000000", # Black bold=False, italic=False, underline=False, strike_out=False, scale_x=1.0, scale_y=1.0, spacing=0, angle=0, border_style=SubtitleBorderStyle.outline, outline=1.0, shadow=0.0, alignment=SubtitleAlignment.bottom_center, margin_l=10, margin_r=10, margin_v=10 )
from videodb import TextStyle style = TextStyle( fontsize=24, fontcolor="black", font="Sans", box=True, boxcolor="white", boxborderw="10" )
from videodb.exceptions import ( VideodbError, AuthenticationError, InvalidRequestError, SearchError ) try: conn = videodb.connect(api_key="invalid_key") except AuthenticationError as e: print(f"Authentication failed: {e}") try: video = conn.upload(url="invalid_url") except InvalidRequestError as e: print(f"Invalid request: {e}") try: results = video.search("query") except SearchError as e: print(f"Search error: {e}")
- Connection: Main client for API interaction
- Collection: Container for organizing media
- Video: Video file with processing methods
- Audio: Audio file representation
- Image: Image file representation
- Timeline: Multi-track video editor
- SearchResult: Search results with shots
- Shot: Time-segmented video clip
- Scene: Visual scene with frames
- SceneCollection: Collection of extracted scenes
- Meeting: Meeting recording session
- RTStream: Real-time stream processor
- CaptureSession: Desktop capture session with export
- CaptureClient: Native binary client for screen/audio recording
- WebSocketConnection: Real-time event streaming
IndexType:spoken_word,sceneSearchType:semantic,keyword,sceneSceneExtractionType:shot_based,time_basedSegmenter:word,sentence,timeTranscodeMode:lightning,economyMediaType:video,audio,image
For detailed API documentation, visit docs.videodb.io.
Explore practical examples and use cases in the VideoDB Cookbook:
- Semantic video search
- Scene-based indexing and retrieval
- Custom video compilations
- Meeting transcription and analysis
- Real-time stream processing
- Multi-language video dubbing
Contributions are welcome! To contribute:
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
- Documentation: docs.videodb.io
- Console: console.videodb.io
- Examples: github.com/video-db/videodb-cookbook
- Community: Discord
- Issues: GitHub Issues
Apache License 2.0 - see LICENSE file for details.