Token Optimization

lacause edited this page Mar 29, 2026 · 1 revision

Token Optimization

OCC is designed to minimize LLM token usage. This guide covers every technique available.

The Core Principle

Traditional approaches stuff everything into one giant prompt. OCC decomposes work into focused steps, each receiving only the context it needs.

×ばつ 40K tokens = 40K tokens OCC: 6 prompts ×ばつ ~2.5K tokens = ~15K tokens">

Traditional: 1 prompt ×ばつ 40K tokens = 40K tokens
OCC: 6 prompts ×ばつ ~2.5K tokens = ~15K tokens

The savings come from isolation — each step only pays for the tokens it actually needs.

Technique 1: Step Isolation

Each step gets its own prompt with only its dependencies injected. No conversation history accumulation.

# Step A outputs 3000 tokens of research
# Step B outputs 2000 tokens of analysis
# Step C only needs Step B's output — it never sees Step A's 3000 tokens
steps:
 - id: a
 prompt: "Research {input.topic}"
 output_var: research
 - id: b
 depends_on: [a]
 prompt: "Analyze: {research}"
 output_var: analysis
 - id: c
 depends_on: [b] # Only depends on B, not A
 prompt: "Summarize: {analysis}" # Receives ~2000 tokens, not 5000
 output_var: summary

Technique 2: Transform Steps (Zero Tokens)

Transform steps manipulate data without LLM calls:

# Research step returns 5000 tokens of JSON
- id: research
 prompt: "Research and return JSON with key_findings, sources, raw_data"
 output_var: raw_research
# Extract only key_findings — costs 0 tokens
- id: extract
 type: transform
 operation: json_extract
 input_var: raw_research
 json_path: "key_findings"
 prompt: "extract"
 output_var: findings # Now ~500 tokens
# Next step receives 500 tokens instead of 5000
- id: report
 depends_on: [extract]
 prompt: "Write report from: {findings}"
 output_var: report

Available zero-token operations: json_extract, regex_match, template, split, merge, truncate, replace, filter, map, join, to_json, from_json

Technique 3: Per-Step Model Selection

Use cheap models for simple tasks, expensive models for critical ones:

steps:
 # Simple classification — use Haiku (0ドル.25/M input)
 - id: classify
 model: claude-haiku-4-5
 prompt: "Classify this text: {input.text}. Return: positive, negative, or neutral."
 output_var: sentiment
 # Complex analysis — use Sonnet (3ドル/M input)
 - id: analyze
 model: claude-sonnet-4-6
 depends_on: [classify]
 prompt: "Deep analysis of {input.text} (classified as {sentiment})"
 output_var: analysis
 # Critical synthesis — use Opus (15ドル/M input)
 - id: synthesize
 model: claude-opus-4-6
 depends_on: [analyze]
 prompt: "Produce final executive report from: {analysis}"
 output_var: report

Cost breakdown:

Step	Model	Input tokens	Cost
classify	Haiku	~200	0ドル.00005
analyze	Sonnet	~2000	0ドル.006
synthesize	Opus	~3000	0ドル.045
Total	~5200	0ドル.051

vs. running everything on Opus: ~8000 tokens ×ばつ 15ドル/M = 0ドル.12 (2.4x more expensive)

Technique 4: Caching

Identical prompts skip the LLM entirely:

- id: expensive_research
 cache:
 enabled: true
 ttl_minutes: 120 # Cache for 2 hours
 prompt: "Research {input.topic}"
 output_var: research

Cache key = hash of (step ID + resolved prompt + model). If the same chain runs with the same inputs within the TTL, the cached result is returned instantly (0 tokens, 0 cost, ~1ms).

Technique 5: Conditional Execution

Skip expensive steps when they're not needed:

- id: quick_check
 model: claude-haiku-4-5
 prompt: "Does this code have security issues? Answer YES or NO: {input.code}"
 output_var: has_issues
- id: deep_audit
 depends_on: [quick_check]
 condition: '{has_issues} contains "YES"' # Only runs if issues found
 model: claude-opus-4-6
 prompt: "Full security audit of: {input.code}"
 output_var: audit

If has_issues is "NO", the deep_audit step is skipped — saving potentially thousands of tokens.

Technique 6: Early Exit

Stop the chain when the answer is found:

- id: check_cache
 prompt: "Is this answer in the knowledge base? {input.question}"
 output_var: cached_answer
 early_exit_if: '{cached_answer} != "NOT_FOUND"'
- id: research
 depends_on: [check_cache]
 prompt: "Research: {input.question}" # Never runs if cache hit
 output_var: research
- id: synthesize
 depends_on: [research]
 prompt: "Answer from research: {research}" # Never runs if cache hit
 output_var: answer

Technique 7: Context Strategy

Control how dependency outputs are compressed:

- id: final_step
 depends_on: [big_research, small_facts]
 context_strategy:
 big_research: "truncate:2000" # Trim to 2000 chars
 small_facts: "full" # Keep as-is
 prompt: |
 Research (condensed): {big_research}
 Key facts: {small_facts}
 output_var: result

Options: full (default), summarize (LLM compression), truncate:N (hard cut at N chars)

Technique 8: Merge Strategy

When combining parallel outputs, choose the cheapest strategy:

- id: combine
 type: merge
 inputs: [research_a, research_b, research_c]
 strategy: pick_best # LLM picks 1 of 3 — cheaper than summarizing all 3
 prompt: "Pick the most relevant research."
 output_var: best_research

Strategy	LLM cost	Output size
`concatenate`	0 tokens	Sum of all inputs
`json_array`	0 tokens	Sum of all inputs
`pick_best`	~input size	1 input's worth
`llm_summarize`	~input size	Compressed

Technique 9: Pre-Tools vs. In-Prompt Requests

Bad — Asking Claude to search inside the prompt (costs tokens for the tool call overhead):

prompt: "Search the web for {input.topic} and then analyze the results"

Good — Pre-tool does the search, data is ready:

pre_tools:
 - type: web_search
 query: "{input.topic} latest research"
 inject_as: data
prompt: "Analyze this data: {data}"

The pre-tool approach is cleaner AND cheaper because the LLM doesn't waste tokens on tool-calling overhead.

Token Budget Calculator

Chain type	Steps	Avg tokens/step	Total	Est. cost (Sonnet)
Simple (3 steps)	3	1500	4.5K	0ドル.014
Medium (6 steps)	6	2500	15K	0ドル.045
Complex (12 steps)	12	3000	36K	0ドル.108
Pipeline (3 chains ×ばつ 5 steps)	15	2000	30K	0ドル.090

Caching can reduce repeat executions to 0ドル.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token Optimization

Token Optimization

The Core Principle

Technique 1: Step Isolation

Technique 2: Transform Steps (Zero Tokens)

Technique 3: Per-Step Model Selection

Technique 4: Caching

Technique 5: Conditional Execution

Technique 6: Early Exit

Technique 7: Context Strategy

Technique 8: Merge Strategy

Technique 9: Pre-Tools vs. In-Prompt Requests

Token Budget Calculator

See Also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally