Pre Tools

lacause edited this page Mar 30, 2026 · 3 revisions

Pre-Tools

Pre-tools inject data into a step's prompt before the LLM runs. They execute in order and make their results available as {inject_as} variables.

OCC has 27 pre-tool types across 6 categories. All pre-tools support:

{variable} interpolation in string fields
on_error: "inject" | "skip" | "fail" — error handling
timeout_ms — per-pre-tool timeout (default: 30s)
retry: N — retry with exponential backoff
cache_ttl_minutes: N — cache result for N minutes
parallel: true — run in parallel with other parallel pre-tools

Pre-tools also support chaining: the output of pre-tool A is available as {inject_as} in pre-tool B.

Data Fetching

http_fetch

Full HTTP client with method, headers, auth, body, and JSON path extraction.

- type: http_fetch
 url: "https://api.example.com/search"
 method: POST # GET (default), POST, PUT, PATCH, DELETE
 headers:
 Authorization: "Bearer {token}" # {variable} interpolation
 Content-Type: "application/json"
 body: '{"query": "{input.topic}"}' # Request body (POST/PUT/PATCH)
 json_path: "data.results[0].name" # Extract JSON path from response
 timeout_ms: 10000
 retry: 2
 on_error: fail
 inject_as: search_results

web_search

Search the web using Claude's built-in WebSearch tool.

- type: web_search
 query: "{input.topic} latest research"
 inject_as: search_results

mcp_call

Call a tool on an external MCP server (GitHub, Slack, PostgreSQL, etc.).

- type: mcp_call
 server: "github"
 tool: "search_repositories"
 args: { query: "{input.topic}" }
 inject_as: repos

Requires occ-mcp-servers.json config. See MCP Client.

db_query

SQL query via CLI (requires psql, mysql, or sqlite3 installed).

- type: db_query
 connection: "postgres://user:pass@localhost/mydb"
 sql: "SELECT * FROM users WHERE role = 'admin' LIMIT 10"
 inject_as: users

Supports: PostgreSQL (postgres://), MySQL (mysql://), SQLite (path.db).

parallel_fetch

Batch multiple URLs with rate limiting.

- type: parallel_fetch
 urls:
 - "https://api.example.com/page/1"
 - "https://api.example.com/page/2"
 - "https://api.example.com/page/3"
 rate_limit_ms: 200
 inject_as: all_pages

Files & Code

read_file

- type: read_file
 path: "/path/to/file.txt"
 encoding: "utf-8" # Any Node.js encoding (default: utf-8)
 inject_as: content

write_file

- type: write_file
 path: "/tmp/output.txt"
 content: "{analysis}"
 append: true # Append instead of overwrite (default: false)
 encoding: "utf-8"
 inject_as: file_path

bash

- type: bash
 command: "git log --oneline -10"
 stderr: true # Capture stderr too (default: false)
 timeout_ms: 30000
 inject_as: git_log

diff_inject

Git diff — structured, LLM-optimized format with per-file summaries.

- type: diff_inject
 repo: "{repo_path}"
 base: "main" # Base ref (default: main)
 head: "HEAD" # Head ref (default: HEAD)
 max_tokens: 4000 # Max output size in estimated tokens
 inject_as: smart_diff

ast_parse

Extract code structure (functions, classes, imports, exports, types) using regex-based parsing.

- type: ast_parse
 path: "{repo_path}/src/index.ts"
 extract: ["functions", "classes", "exports", "types"]
 inject_as: code_structure

Supports: TypeScript/JavaScript, Python, Go.

ocr

Image → text via Tesseract.

- type: ocr
 image_path: "/tmp/document.png"
 language: "fra" # Tesseract language code (default: eng)
 inject_as: extracted_text

screenshot

URL → PNG screenshot via Playwright.

- type: screenshot
 url: "https://example.com/dashboard"
 viewport: { width: 1440, height: 900 }
 wait_ms: 3000
 inject_as: screenshot_path

pdf_generate

HTML → PDF via wkhtmltopdf or Chrome headless.

- type: pdf_generate
 html: "<h1>Report</h1><p>{analysis}</p>"
 output_path: "/tmp/report.pdf"
 inject_as: pdf_path

State & Memory

state_load / state_save

Persistent key-value store across chain executions. Chains remember results between runs.

# Load state from a previous run
- type: state_load
 key: "last_scan_results"
 scope: "bounty-hunter" # Chain name or "global" (default: current chain)
 default: "No previous data"
 inject_as: previous_results
# Save state for the next run (in a later step)
- type: state_save
 key: "last_scan_results"
 value: "{scan_results}"
 scope: "bounty-hunter"
 inject_as: save_status

vector_query / vector_index

Local semantic search via SQLite FTS5. RAG without external infrastructure.

# Index documents
- type: vector_index
 collection: "project_docs"
 source: "{document_content}"
 chunk_size: 512
 inject_as: index_status
# Query indexed documents
- type: vector_query
 collection: "project_docs"
 query: "authentication architecture"
 top_k: 5
 inject_as: relevant_docs

semantic_cache

Cache by semantic similarity (not exact hash). Re-running similar queries returns cached results.

- type: semantic_cache
 query: "{input.topic} market analysis"
 cache_ttl_minutes: 720
 similarity_threshold: 0.85
 inject_as: cached_analysis

graph_query

Knowledge graph with triples (subject → predicate → object).

# Write triples
- type: graph_query
 triples:
 - { subject: "OCC", predicate: "is_a", object: "orchestrator" }
 - { subject: "OCC", predicate: "uses", object: "Claude" }
 inject_as: write_status
# Read triples
- type: graph_query
 graph_query_subject: "OCC"
 inject_as: occ_facts

Data Processing

json_parse

Extract a field from JSON output (handles markdown-wrapped JSON).

- type: json_parse
 input: "{llm_output}"
 json_path: "opportunities[0].name"
 inject_as: best_opportunity

template_render

Handlebars-style template engine with {{#each}}, {{#if}}, nested paths.

- type: template_render
 template: |
 {{#each items}}
 - {{name}}: {{value}}
 {{/each}}
 {{#if show_total}}Total: {{total}}{{/if}}
 data:
 items: [{ name: "A", value: 1 }, { name: "B", value: 2 }]
 show_total: true
 total: 3
 inject_as: formatted

embed_compare

Compare two texts — returns similarity score and drift detection.

- type: embed_compare
 text_a: "{previous_analysis}"
 text_b: "{current_analysis}"
 inject_as: drift_report
 # Returns: {"similarity": 0.73, "verdict": "partially_changed", "new_keywords": [...]}

cost_gate

Check token budget mid-execution. Skip or warn if over budget.

- type: cost_gate
 budget_usd: 0.50
 action: "warn" # "warn" | "skip" | "downgrade"
 inject_as: budget_status
 # Returns: {"status": "within_budget", "spent_usd": 0.12, "remaining_usd": 0.38}

Notifications

notify

Multi-channel notifications (Slack, Discord, Telegram, webhook).

- type: notify
 channel: "slack" # slack | discord | telegram | webhook
 webhook_url: "{env:SLACK_WEBHOOK}"
 message: "Chain found {count} results"
 inject_as: notification_status

email

Send email via SMTP or SendGrid.

- type: email
 to: "user@example.com"
 subject: "Alert: {input.topic}"
 content: "Details: {analysis}"
 provider: sendgrid # smtp (default) | sendgrid
 inject_as: email_result

approval_request

Generate a shareable approval URL for human-in-the-loop workflows.

- type: approval_request
 title: "Deploy to production?"
 description: "Changes: {diff_summary}"
 expires_hours: 4
 inject_as: approval_info
 # Returns: {"approve_url": "http://...", "approve_command": "curl ...", "token": "..."}

System

env_var

- type: env_var
 var_name: "API_KEY"
 default_value: "demo_key" # Fallback if not set (default: "")
 inject_as: key

current_datetime

- type: current_datetime
 timezone: "Europe/Paris" # IANA timezone (default: UTC)
 format: "locale" # iso (default) | locale | unix
 inject_as: now

sandbox_exec

Run commands in an isolated Docker container.

- type: sandbox_exec
 image: "node:20-slim"
 command: "npm install && npm test"
 mount: "/project:/workspace"
 timeout_ms: 60000
 inject_as: test_results

Error Handling

pre_tools:
 - type: http_fetch
 url: "https://unreliable.api.com"
 on_error: skip # "inject" (default) | "skip" | "fail"
 retry: 3 # Retry 3 times with backoff
 timeout_ms: 5000 # 5 second timeout
 inject_as: data

Mode	Behavior
`inject`	Injects `[PRE-TOOL ERROR: message]` into prompt (default)
`skip`	Injects empty string — step continues cleanly
`fail`	Aborts the step with an error

Pre-Tool Chaining

Pre-tools execute sequentially by default. Output of pre-tool A is available in pre-tool B:

pre_tools:
 # Step 1: Get auth token
 - type: env_var
 var_name: API_TOKEN
 inject_as: token
 # Step 2: Use token in API call (chaining!)
 - type: http_fetch
 url: "https://api.example.com/data"
 headers:
 Authorization: "Bearer {token}"
 inject_as: api_data

Parallel Execution

Mark pre-tools as parallel: true to run them simultaneously:

pre_tools:
 - type: web_search
 query: "topic A news"
 parallel: true # ← runs in parallel
 inject_as: news_a
 - type: web_search
 query: "topic B news"
 parallel: true # ← runs in parallel
 inject_as: news_b
 - type: bash
 command: "echo done" # ← sequential (after parallel batch)
 inject_as: status

Pre Tools

Pre-Tools

Data Fetching

http_fetch

web_search

mcp_call

db_query

parallel_fetch

Files & Code

read_file

write_file

bash

diff_inject

ast_parse

ocr

screenshot

pdf_generate

State & Memory

state_load / state_save

vector_query / vector_index

semantic_cache

graph_query

Data Processing

json_parse

template_render

embed_compare

cost_gate

Notifications

notify

email

approval_request

System

env_var

current_datetime

sandbox_exec

Error Handling

Pre-Tool Chaining

Parallel Execution

See Also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally