Skip to main content
VideoDB
Build with VideoDB Talk to us Try VideoDB
Agentic Perception

Build AI agents with visual perception.

One SDK gives your agent eyes, ears, and memory across screen, mic, video, and live sessions. Native runtimes for Mac, Windows, Linux, and the web.

Try the SDK View OSS agents
A new surface for AI

AI is moving out of the chatbox.
Your agents need eyes and ears.

Agents are creating content, running marketing, recording meetings, taking calls, and using the computer.

The world they operate in is live, continuous, and perceived through vision and voice. Not turns of text.

VideoDB gives your agents realtime real-world context and memory. One SDK across screen, mic, files, and live streams, so your agent can see what just happened, recall what it watched, and act on what it heard.

Screen Mic Camera Files Live streams
What builders ship

Build the next interface
for AI.

The next generation of software won't live in a chat window. It will watch your screen, work the web for you, and run inside containers that never sleep. Builders on VideoDB are already shipping all three.

Capabilities

One SDK.
All of media.

Files, live streams, screen captures. All enter the same system.

One command.

npx skills add video-db/skills. Bootstraps every primitive into your agent runtime.

Files, RTSP, screen, mic.

One API across every source.

Compose understanding.

Custom indexes the way you compose endpoints.

Search returns a playable clip.

Not metadata. Not timestamps. A clip the agent can play.

Stream in, stream out.

Sub-second alert, act, respond.

Claude Code · OpenAI · Cursor · n8n · Zapier.

Drop into any agent that speaks tools.

Two modes for agents

Realtime by default.
Memory when you ask.

Stream in, context out. Nothing is stored unless you say so. Flip one flag when a moment is worth keeping.

Mode 1 · Ephemeral

Realtime. No storage.

Frames flow in, structured events flow out. Nothing touches disk. Best for live copilots, alerting, and anything sub-second.

Default
Mode 2 · Memory

Remember and search.

Flip one flag and the moment becomes a searchable clip. Memory and search are opt-in: on for the moments you care about, off everywhere else.

Optional
Perception box

Build a perception box.
Realtime, private, predictable.

A dedicated perception runtime for teams that need realtime throughput, predictable cost and load, and zero outbound calls to a model API.

Sized to your fleet. Every frame, every inference, every retrieval runs inside the box. Use the bundled models, or bring your own open-weight model.

Realtime processing Zero outbound One flat number Bring your own model

Realtime, sub-second pipeline

Ingest, perception, and event-out sized to your throughput from day one.

Built-in network monitor

Verify isolation in one glance. A live view of every connection the runtime makes.

Bundled perception models

Vision, speech, and embedding models pre-loaded and ready to use.

One capacity envelope

Flat monthly cost. No per-token surprises, no traffic-driven spikes.

No-code workflows

Build workflows on n8n and Zapier.
Same primitives, drag-and-drop.

Every VideoDB primitive is exposed as a node. Index a feed, search for a moment, clip and deliver. All without writing code.

Try it yourself

Open a repo.
Or visit the site.

Every agent on this page ships as an open-source project you can try.

call.md — a meeting captured as a markdown document with a playable clip for each decision. GitHub repo

call.md

Meetings captured as markdown with playable clips for every decision.

Open repo
The pair-programmer agent watching the screen and brainstorming with full context. GitHub repo

Pair programmer

An agent that watches your screen and YouTube tabs and brainstorms with full context.

Open repo
A research agent's video brief assembled from sources crawled across the web. GitHub repo

Research agents

A report you can watch. An agent crawls the web and assembles a video brief.

Open repo
The Try-my-repo agent cloning a repo, running its tests, and narrating a demo video. Website

Try my repo

Hand it a repo. A Pi agent runs it, narrates it, and ships back a demo video.

Visit site
Building a desktop agent — streaming screen and mic from a quickstart notebook. GitHub repo

Build desktop agents

Install the native SDK on Mac, Windows, or Linux. Start streaming screen + mic in minutes.

Open repo

Give your agents
eyes and ears.

npx skills add video-db/skills
Machine

AltStyle によって変換されたページ (->オリジナル) /