Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Token Optimization

lacause edited this page Mar 29, 2026 · 1 revision

Token Optimization

OCC is designed to minimize LLM token usage. This guide covers every technique available.

The Core Principle

Traditional approaches stuff everything into one giant prompt. OCC decomposes work into focused steps, each receiving only the context it needs.

×ばつ 40K tokens = 40K tokens OCC: 6 prompts ×ばつ ~2.5K tokens = ~15K tokens">
Traditional: 1 prompt ×ばつ 40K tokens = 40K tokens
OCC: 6 prompts ×ばつ ~2.5K tokens = ~15K tokens

The savings come from isolation — each step only pays for the tokens it actually needs.

Technique 1: Step Isolation

Each step gets its own prompt with only its dependencies injected. No conversation history accumulation.

# Step A outputs 3000 tokens of research
# Step B outputs 2000 tokens of analysis
# Step C only needs Step B's output — it never sees Step A's 3000 tokens
steps:
 - id: a
 prompt: "Research {input.topic}"
 output_var: research
 - id: b
 depends_on: [a]
 prompt: "Analyze: {research}"
 output_var: analysis
 - id: c
 depends_on: [b] # Only depends on B, not A
 prompt: "Summarize: {analysis}" # Receives ~2000 tokens, not 5000
 output_var: summary

Technique 2: Transform Steps (Zero Tokens)

Transform steps manipulate data without LLM calls:

# Research step returns 5000 tokens of JSON
- id: research
 prompt: "Research and return JSON with key_findings, sources, raw_data"
 output_var: raw_research
# Extract only key_findings — costs 0 tokens
- id: extract
 type: transform
 operation: json_extract
 input_var: raw_research
 json_path: "key_findings"
 prompt: "extract"
 output_var: findings # Now ~500 tokens
# Next step receives 500 tokens instead of 5000
- id: report
 depends_on: [extract]
 prompt: "Write report from: {findings}"
 output_var: report

Available zero-token operations: json_extract, regex_match, template, split, merge, truncate, replace, filter, map, join, to_json, from_json

Technique 3: Per-Step Model Selection

Use cheap models for simple tasks, expensive models for critical ones:

steps:
 # Simple classification — use Haiku (0ドル.25/M input)
 - id: classify
 model: claude-haiku-4-5
 prompt: "Classify this text: {input.text}. Return: positive, negative, or neutral."
 output_var: sentiment
 # Complex analysis — use Sonnet (3ドル/M input)
 - id: analyze
 model: claude-sonnet-4-6
 depends_on: [classify]
 prompt: "Deep analysis of {input.text} (classified as {sentiment})"
 output_var: analysis
 # Critical synthesis — use Opus (15ドル/M input)
 - id: synthesize
 model: claude-opus-4-6
 depends_on: [analyze]
 prompt: "Produce final executive report from: {analysis}"
 output_var: report

Cost breakdown:

Step Model Input tokens Cost
classify Haiku ~200 0ドル.00005
analyze Sonnet ~2000 0ドル.006
synthesize Opus ~3000 0ドル.045
Total ~5200 0ドル.051

vs. running everything on Opus: ~8000 tokens ×ばつ 15ドル/M = 0ドル.12 (2.4x more expensive)

Technique 4: Caching

Identical prompts skip the LLM entirely:

- id: expensive_research
 cache:
 enabled: true
 ttl_minutes: 120 # Cache for 2 hours
 prompt: "Research {input.topic}"
 output_var: research

Cache key = hash of (step ID + resolved prompt + model). If the same chain runs with the same inputs within the TTL, the cached result is returned instantly (0 tokens, 0 cost, ~1ms).

Technique 5: Conditional Execution

Skip expensive steps when they're not needed:

- id: quick_check
 model: claude-haiku-4-5
 prompt: "Does this code have security issues? Answer YES or NO: {input.code}"
 output_var: has_issues
- id: deep_audit
 depends_on: [quick_check]
 condition: '{has_issues} contains "YES"' # Only runs if issues found
 model: claude-opus-4-6
 prompt: "Full security audit of: {input.code}"
 output_var: audit

If has_issues is "NO", the deep_audit step is skipped — saving potentially thousands of tokens.

Technique 6: Early Exit

Stop the chain when the answer is found:

- id: check_cache
 prompt: "Is this answer in the knowledge base? {input.question}"
 output_var: cached_answer
 early_exit_if: '{cached_answer} != "NOT_FOUND"'
- id: research
 depends_on: [check_cache]
 prompt: "Research: {input.question}" # Never runs if cache hit
 output_var: research
- id: synthesize
 depends_on: [research]
 prompt: "Answer from research: {research}" # Never runs if cache hit
 output_var: answer

Technique 7: Context Strategy

Control how dependency outputs are compressed:

- id: final_step
 depends_on: [big_research, small_facts]
 context_strategy:
 big_research: "truncate:2000" # Trim to 2000 chars
 small_facts: "full" # Keep as-is
 prompt: |
 Research (condensed): {big_research}
 Key facts: {small_facts}
 output_var: result

Options: full (default), summarize (LLM compression), truncate:N (hard cut at N chars)

Technique 8: Merge Strategy

When combining parallel outputs, choose the cheapest strategy:

- id: combine
 type: merge
 inputs: [research_a, research_b, research_c]
 strategy: pick_best # LLM picks 1 of 3 — cheaper than summarizing all 3
 prompt: "Pick the most relevant research."
 output_var: best_research
Strategy LLM cost Output size
concatenate 0 tokens Sum of all inputs
json_array 0 tokens Sum of all inputs
pick_best ~input size 1 input's worth
llm_summarize ~input size Compressed

Technique 9: Pre-Tools vs. In-Prompt Requests

Bad — Asking Claude to search inside the prompt (costs tokens for the tool call overhead):

prompt: "Search the web for {input.topic} and then analyze the results"

Good — Pre-tool does the search, data is ready:

pre_tools:
 - type: web_search
 query: "{input.topic} latest research"
 inject_as: data
prompt: "Analyze this data: {data}"

The pre-tool approach is cleaner AND cheaper because the LLM doesn't waste tokens on tool-calling overhead.

Token Budget Calculator

Chain type Steps Avg tokens/step Total Est. cost (Sonnet)
Simple (3 steps) 3 1500 4.5K 0ドル.014
Medium (6 steps) 6 2500 15K 0ドル.045
Complex (12 steps) 12 3000 36K 0ドル.108
Pipeline (3 chains ×ばつ 5 steps) 15 2000 30K 0ドル.090

Caching can reduce repeat executions to 0ドル.

See Also

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /