Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

perplexityai/bumblebee

Repository files navigation

bumblebee

Bumblebee is a read-only inventory collector for package, extension, and developer-tool metadata on macOS and Linux developer endpoints.

It answers a narrow supply-chain response question: when an advisory names a package, extension, or version, which developer machines show a match in their on-disk metadata right now?

SBOMs help answer what shipped, and EDR helps answer what ran or touched the network, but supply-chain response often needs a different view: messy local state across lockfiles, package-manager metadata, extension manifests, and supported developer-tool configs.

Bumblebee turns that scattered on-disk state into structured NDJSON component records and, when given an exposure catalog, flags exact matches for fast, read-only exposure checks when responders already know what they are looking for.

Scope

  • Single static binary, Go 1.25+, zero non-stdlib dependencies.
  • Three scan profiles (baseline, project, deep) for different populations and cadences.
  • Reads only the lockfiles, package-manager install metadata, extension manifests, and supported MCP JSON configs listed in docs/inventory-sources.md. No package manager execution (npm ls, pip show, go list, ...) and no source-file reads. MCP host configs can carry environment values and credentials in their env blocks; Bumblebee parses these configs for the server inventory it needs but does not emit those values in its records.

Coverage

Family Emitted ecosystem Sources
npm npm package-lock.json, npm-shrinkwrap.json, node_modules/.package-lock.json, node_modules/<pkg>/package.json
pnpm npm pnpm-lock.yaml, .pnpm/.../package.json
Yarn npm yarn.lock (Classic + Berry)
Bun npm bun.lock; bun.lockb presence as diagnostic
PyPI pypi *.dist-info/METADATA, INSTALLER, direct_url.json, *.egg-info/PKG-INFO
Go modules go go.sum, go.mod
RubyGems rubygems Gemfile.lock, installed *.gemspec
Composer packagist composer.lock, vendor/composer/installed.json
MCP mcp JSON host configs: mcp.json, .mcp.json, claude_desktop_config.json, mcp_config.json, mcp_settings.json, cline_mcp_settings.json, plus ~/.gemini/settings.json (Gemini CLI / Code Assist) and ~/.claude.json (Claude Code user- and project-scoped mcpServers). Non-JSON configs (Codex config.toml, Continue YAML) are not parsed in v0.1.
Agent skills agent-skill skills.sh / vercel-labs/skills lock files: global ~/.agents/.skill-lock.json (or $XDG_STATE_HOME/skills/.skill-lock.json) and project-local skills-lock.json. Loose SKILL.md directories without a lock file are not enumerated.
Editor extensions editor-extension VS Code, Cursor, Windsurf, VSCodium manifests
Browser extensions browser-extension Chromium-family (manifest.json) and Firefox (extensions.json) per profile
Homebrew homebrew Formula INSTALL_RECEIPT.json files and cask .metadata install markers

Per-ecosystem detail: docs/inventory-sources.md.

Install

Requires Go 1.25+. Zero non-stdlib dependencies.

# Install the latest tagged release into $GOBIN.
go install github.com/perplexityai/bumblebee/cmd/bumblebee@latest
# Or pin a specific tag.
go install github.com/perplexityai/bumblebee/cmd/bumblebee@v0.1.1

To build from a checkout:

go build -o bumblebee ./cmd/bumblebee
go test ./...

Stamp an explicit version at build time:

go build -ldflags "-X main.Version=v0.1.1" -o bumblebee ./cmd/bumblebee

bumblebee version prints the version plus the VCS revision, build time, and Go runtime — so a record emitted in production can be traced back to a specific build. Version precedence: -ldflags override, module version recorded by go install, then the in-tree default tracked in VERSION.

Self-test

After installing, run a built-in end-to-end check against embedded fixtures:

bumblebee selftest
# selftest OK (2 findings in 1ms)

The fixtures live inside the binary, use deliberately fake package names (bumblebee-selftest-evil@0.0.0), and make no network calls. A non-zero exit means the local install can no longer detect what it should — a fast pre-deployment smoke test for fleet rollouts.

Profiles

Bumblebee is a one-shot scanner: each invocation performs a single scan and exits. Cadence is the runner's responsibility (cron, launchd, systemd, MDM, etc.). Each record carries profile and a per-root root_kind so receivers can keep populations separate.

Profile Scans Use for
baseline Common global/user package roots, language toolchains, editor extensions, browser extensions, and MCP configs. Recurring lightweight inventory via an external runner.
project Configured development directories, such as ~/code, ~/src, or ~/work. Recurring inventory for known project workspaces.
deep Explicit --root paths, including broad roots like $HOME. On-demand incident or campaign checks, usually with --ecosystem, --exposure-catalog, and --findings-only.

baseline and project refuse bare-home roots; only deep walks them.

Quick start

# Baseline global inventory.
bumblebee scan --profile baseline > inventory.ndjson
# Daily project sweep with explicit roots.
bumblebee scan --profile project \
 --root "$HOME/code" \
 --root "$HOME/Developer"
# Limit a run to selected emitted ecosystems.
bumblebee scan --profile baseline \
 --ecosystem npm,pypi \
 --ecosystem go
# On-demand exposure scan against a published advisory.
bumblebee scan --profile deep \
 --root "$HOME" \
 --exposure-catalog ./catalog.json \
 --max-duration 10m

Preview the resolved roots without scanning:

bumblebee roots --profile baseline
# prints "<root_kind>\t<path>" lines

--root is a filesystem path to scan; repeatable, required for deep, optional for the other profiles. --ecosystem is repeatable and comma-separated. --exposure-catalog accepts a JSON file or a directory of *.json catalogs (merged non-recursively, all files must share schema_version). --findings-only requires --exposure-catalog and suppresses package records while keeping findings. bumblebee scan --help lists every flag.

Output

Records are NDJSON, one per line. Diagnostics go to stderr as NDJSON. Each run ends with a scan_summary record; receivers use it to decide whether to promote a run to current state. See docs/transport.md for HTTPS/file output and docs/state-model.md for the receiver-side current-state model.

Package record:

Example package record
{
 "record_type": "package",
 "record_id": "package:...",
 "schema_version": "0.1.0",
 "scanner_name": "bumblebee",
 "scanner_version": "v0.1.1",
 "run_id": "9b1f0c2e4d5a6b7c8d9e0f1a2b3c4d5e",
 "scan_time": "2026年05月15日T18:22:01.482Z",
 "endpoint": {
 "hostname": "alex-mbp",
 "os": "darwin",
 "arch": "arm64",
 "username": "alex",
 "uid": "501",
 "device_id": "MDM-7F4A2B"
 },
 "profile": "project",
 "ecosystem": "npm",
 "package_name": "@tanstack/query-core",
 "normalized_name": "@tanstack/query-core",
 "version": "5.59.20",
 "project_path": "/Users/alex/code/web-app",
 "root_kind": "project_root",
 "package_manager": "pnpm",
 "source_type": "pnpm-lockfile",
 "source_file": "/Users/alex/code/web-app/pnpm-lock.yaml",
 "has_lifecycle_scripts": false,
 "confidence": "high"
}

confidence:

  • high — exact identity and version came from canonical metadata.
  • medium — identity is reliable, but version or source is partial.
  • low — config/path/spec reference only; not proof of an installed exact version.

Finding record (exposure-catalog match):

Example finding record
{
 "record_type": "finding",
 "record_id": "finding:...",
 "schema_version": "0.1.0",
 "scanner_name": "bumblebee",
 "scanner_version": "v0.1.1",
 "run_id": "3a8c7d1e9f0b2a4c6d8e0f1a2b3c4d5e",
 "scan_time": "2026年05月15日T18:22:01.482Z",
 "endpoint": {
 "hostname": "alex-mbp",
 "os": "darwin",
 "arch": "arm64",
 "username": "alex",
 "uid": "501",
 "device_id": "MDM-7F4A2B"
 },
 "profile": "deep",
 "finding_type": "package_exposure",
 "severity": "critical",
 "catalog_id": "advisory-2026-0042",
 "catalog_name": "example-pkg 1.2.3 (compromised release)",
 "ecosystem": "npm",
 "package_name": "example-pkg",
 "normalized_name": "example-pkg",
 "version": "1.2.3",
 "root_kind": "deep_home_root",
 "project_path": "/Users/alex/code/web-app",
 "source_type": "pnpm-lockfile",
 "source_file": "/Users/alex/code/web-app/pnpm-lock.yaml",
 "confidence": "high",
 "evidence": "exact name+version match (version=1.2.3)"
}

record_id is a content-addressed hash of a canonical identity tuple per record type, stable across runs. Per-record-type field lists and dedupe guidance: docs/state-model.md.

Exposure Catalog Format

Minimal JSON, exact (ecosystem, name, version) matching only:

{
 "schema_version": "0.1.0",
 "entries": [
 {
 "id": "advisory-2026-0042",
 "name": "example-pkg 1.2.3 (compromised release)",
 "ecosystem": "npm",
 "package": "example-pkg",
 "versions": ["1.2.3"],
 "severity": "critical"
 }
 ]
}

The catalog must be a JSON object with schema_version and entries keys. Bare top-level arrays are rejected. Unsupported future schema_version values are rejected. Multiple catalog files can be loaded together by pointing --exposure-catalog at a directory; see the flag description above.

Sample exposure catalogs

The threat_intel/ directory holds maintained exposure catalogs built from public threat-intelligence reporting on recent supply-chain campaigns, assembled with Perplexity Computer and updated via PRs as new campaigns are reported. See threat_intel/README.md for the current catalog list and review guidance.

License

Apache License 2.0. See LICENSE.

About

Read-only developer endpoint scanner for on-disk package, extension, and developer-tool metadata, built to check exposure to known software supply-chain compromises.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

Contributors

Languages

AltStyle によって変換されたページ (->オリジナル) /