-
Notifications
You must be signed in to change notification settings - Fork 2
Pre Tools
Pre-tools inject data into a step's prompt before the LLM runs. They execute in order and make their results available as {inject_as} variables.
OCC has 27 pre-tool types across 6 categories. All pre-tools support:
-
{variable}interpolation in string fields -
on_error: "inject" | "skip" | "fail"— error handling -
timeout_ms— per-pre-tool timeout (default: 30s) -
retry: N— retry with exponential backoff -
cache_ttl_minutes: N— cache result for N minutes -
parallel: true— run in parallel with other parallel pre-tools
Pre-tools also support chaining: the output of pre-tool A is available as {inject_as} in pre-tool B.
Full HTTP client with method, headers, auth, body, and JSON path extraction.
- type: http_fetch url: "https://api.example.com/search" method: POST # GET (default), POST, PUT, PATCH, DELETE headers: Authorization: "Bearer {token}" # {variable} interpolation Content-Type: "application/json" body: '{"query": "{input.topic}"}' # Request body (POST/PUT/PATCH) json_path: "data.results[0].name" # Extract JSON path from response timeout_ms: 10000 retry: 2 on_error: fail inject_as: search_results
Search the web using Claude's built-in WebSearch tool.
- type: web_search query: "{input.topic} latest research" inject_as: search_results
Call a tool on an external MCP server (GitHub, Slack, PostgreSQL, etc.).
- type: mcp_call server: "github" tool: "search_repositories" args: { query: "{input.topic}" } inject_as: repos
Requires occ-mcp-servers.json config. See MCP Client.
SQL query via CLI (requires psql, mysql, or sqlite3 installed).
- type: db_query connection: "postgres://user:pass@localhost/mydb" sql: "SELECT * FROM users WHERE role = 'admin' LIMIT 10" inject_as: users
Supports: PostgreSQL (postgres://), MySQL (mysql://), SQLite (path.db).
Batch multiple URLs with rate limiting.
- type: parallel_fetch urls: - "https://api.example.com/page/1" - "https://api.example.com/page/2" - "https://api.example.com/page/3" rate_limit_ms: 200 inject_as: all_pages
- type: read_file path: "/path/to/file.txt" encoding: "utf-8" # Any Node.js encoding (default: utf-8) inject_as: content
- type: write_file path: "/tmp/output.txt" content: "{analysis}" append: true # Append instead of overwrite (default: false) encoding: "utf-8" inject_as: file_path
- type: bash command: "git log --oneline -10" stderr: true # Capture stderr too (default: false) timeout_ms: 30000 inject_as: git_log
Git diff — structured, LLM-optimized format with per-file summaries.
- type: diff_inject repo: "{repo_path}" base: "main" # Base ref (default: main) head: "HEAD" # Head ref (default: HEAD) max_tokens: 4000 # Max output size in estimated tokens inject_as: smart_diff
Extract code structure (functions, classes, imports, exports, types) using regex-based parsing.
- type: ast_parse path: "{repo_path}/src/index.ts" extract: ["functions", "classes", "exports", "types"] inject_as: code_structure
Supports: TypeScript/JavaScript, Python, Go.
Image → text via Tesseract.
- type: ocr image_path: "/tmp/document.png" language: "fra" # Tesseract language code (default: eng) inject_as: extracted_text
URL → PNG screenshot via Playwright.
- type: screenshot url: "https://example.com/dashboard" viewport: { width: 1440, height: 900 } wait_ms: 3000 inject_as: screenshot_path
HTML → PDF via wkhtmltopdf or Chrome headless.
- type: pdf_generate html: "<h1>Report</h1><p>{analysis}</p>" output_path: "/tmp/report.pdf" inject_as: pdf_path
Persistent key-value store across chain executions. Chains remember results between runs.
# Load state from a previous run - type: state_load key: "last_scan_results" scope: "bounty-hunter" # Chain name or "global" (default: current chain) default: "No previous data" inject_as: previous_results # Save state for the next run (in a later step) - type: state_save key: "last_scan_results" value: "{scan_results}" scope: "bounty-hunter" inject_as: save_status
Local semantic search via SQLite FTS5. RAG without external infrastructure.
# Index documents - type: vector_index collection: "project_docs" source: "{document_content}" chunk_size: 512 inject_as: index_status # Query indexed documents - type: vector_query collection: "project_docs" query: "authentication architecture" top_k: 5 inject_as: relevant_docs
Cache by semantic similarity (not exact hash). Re-running similar queries returns cached results.
- type: semantic_cache query: "{input.topic} market analysis" cache_ttl_minutes: 720 similarity_threshold: 0.85 inject_as: cached_analysis
Knowledge graph with triples (subject → predicate → object).
# Write triples - type: graph_query triples: - { subject: "OCC", predicate: "is_a", object: "orchestrator" } - { subject: "OCC", predicate: "uses", object: "Claude" } inject_as: write_status # Read triples - type: graph_query graph_query_subject: "OCC" inject_as: occ_facts
Extract a field from JSON output (handles markdown-wrapped JSON).
- type: json_parse input: "{llm_output}" json_path: "opportunities[0].name" inject_as: best_opportunity
Handlebars-style template engine with {{#each}}, {{#if}}, nested paths.
- type: template_render template: | {{#each items}} - {{name}}: {{value}} {{/each}} {{#if show_total}}Total: {{total}}{{/if}} data: items: [{ name: "A", value: 1 }, { name: "B", value: 2 }] show_total: true total: 3 inject_as: formatted
Compare two texts — returns similarity score and drift detection.
- type: embed_compare text_a: "{previous_analysis}" text_b: "{current_analysis}" inject_as: drift_report # Returns: {"similarity": 0.73, "verdict": "partially_changed", "new_keywords": [...]}
Check token budget mid-execution. Skip or warn if over budget.
- type: cost_gate budget_usd: 0.50 action: "warn" # "warn" | "skip" | "downgrade" inject_as: budget_status # Returns: {"status": "within_budget", "spent_usd": 0.12, "remaining_usd": 0.38}
Multi-channel notifications (Slack, Discord, Telegram, webhook).
- type: notify channel: "slack" # slack | discord | telegram | webhook webhook_url: "{env:SLACK_WEBHOOK}" message: "Chain found {count} results" inject_as: notification_status
Send email via SMTP or SendGrid.
- type: email to: "user@example.com" subject: "Alert: {input.topic}" content: "Details: {analysis}" provider: sendgrid # smtp (default) | sendgrid inject_as: email_result
Generate a shareable approval URL for human-in-the-loop workflows.
- type: approval_request title: "Deploy to production?" description: "Changes: {diff_summary}" expires_hours: 4 inject_as: approval_info # Returns: {"approve_url": "http://...", "approve_command": "curl ...", "token": "..."}
- type: env_var var_name: "API_KEY" default_value: "demo_key" # Fallback if not set (default: "") inject_as: key
- type: current_datetime timezone: "Europe/Paris" # IANA timezone (default: UTC) format: "locale" # iso (default) | locale | unix inject_as: now
Run commands in an isolated Docker container.
- type: sandbox_exec image: "node:20-slim" command: "npm install && npm test" mount: "/project:/workspace" timeout_ms: 60000 inject_as: test_results
pre_tools: - type: http_fetch url: "https://unreliable.api.com" on_error: skip # "inject" (default) | "skip" | "fail" retry: 3 # Retry 3 times with backoff timeout_ms: 5000 # 5 second timeout inject_as: data
| Mode | Behavior |
|---|---|
inject |
Injects [PRE-TOOL ERROR: message] into prompt (default) |
skip |
Injects empty string — step continues cleanly |
fail |
Aborts the step with an error |
Pre-tools execute sequentially by default. Output of pre-tool A is available in pre-tool B:
pre_tools: # Step 1: Get auth token - type: env_var var_name: API_TOKEN inject_as: token # Step 2: Use token in API call (chaining!) - type: http_fetch url: "https://api.example.com/data" headers: Authorization: "Bearer {token}" inject_as: api_data
Mark pre-tools as parallel: true to run them simultaneously:
pre_tools: - type: web_search query: "topic A news" parallel: true # ← runs in parallel inject_as: news_a - type: web_search query: "topic B news" parallel: true # ← runs in parallel inject_as: news_b - type: bash command: "echo done" # ← sequential (after parallel batch) inject_as: status
- Chain Format — YAML reference
- Step Types — How steps use pre-tool data
- MCP Client — External MCP server configuration
- Token Optimization — Pre-tools reduce LLM token usage