Skip to main content
Hindsight is State-of-the-Art on Memory for AI Agents | Read the paper β†’
πŸ€–
Using a coding agent? Run this to install the Hindsight docs skill:
npx skills add https://github.com/vectorize-io/hindsight --skill hindsight-docs

Reflect

Generate a grounded, disposition-aware response using an agentic reasoning loop.

When you call reflect, Hindsight runs an agentic loop that autonomously searches the memory bank using multiple retrieval tools, applies the bank's disposition traits to shape the reasoning style, and produces a final answer grounded in what it found. Unlike recall β€” which returns raw facts β€” reflect returns a synthesized response written by the LLM.

How Reflect Works

Learn about disposition-driven reasoning in the Reflect Architecture guide.

Prerequisites

Make sure you've completed the Quick Start to install the client and start the server.

Basic Usage​

  • Python
  • Node.js
  • CLI
  • Go
client.reflect(bank_id="my-bank", query="What should I know about Alice?")

Parameters​

query​

The question or prompt to reflect on. This is the only required field. If you have situational context that should influence the answer, include it directly in the query rather than as a separate field.

budget​

Controls how thoroughly the agent explores the memory bank before answering. Accepted values are low (default), mid, and high. At low, the agent does a shallow search optimized for speed. At mid, it checks multiple sources when the question warrants it. At high, it performs deep exploration across all knowledge levels and may use multiple query variations to find indirect connections. Use high for complex questions that require synthesizing information from many sources.

  • Python
  • Node.js
  • CLI
  • Go
response = client.reflect(
bank_id="my-bank",
query="We're considering a hybrid work policy. What do you think about remote work?",
budget="mid",
)

max_tokens​

Limits the length of the final generated response. Defaults to 4096. This does not affect how much the agent can retrieve during the agentic loop β€” only the final answer length.

response_schema​

An optional JSON Schema object. When provided, the LLM generates a response that conforms to the schema and the response includes a structured_output field with the result parsed accordingly. The text field will be empty since only a single structured LLM call is made. Use this when you need to process the response programmatically rather than display it as prose.

  • Python
  • Node.js
  • CLI
  • Go
from pydantic import BaseModel

# Define your response structure with Pydantic
classHiringRecommendation(BaseModel):
recommendation:str
confidence:str# "low", "medium", "high"
key_factors:list[str]
risks:list[str]=[]

response = client.reflect(
bank_id="hiring-team",
query="Should we hire Alice for the ML team lead position?",
response_schema=HiringRecommendation.model_json_schema(),
)

# Parse structured output into Pydantic model
result = HiringRecommendation.model_validate(response.structured_output)
print(f"Recommendation: {result.recommendation}")
print(f"Confidence: {result.confidence}")
print(f"Key factors: {result.key_factors}")

tags​

Filters which memories the agent can access during reflection. Works identically to recall tags β€” only memories matching the specified tags are considered. The tags_match parameter controls the matching logic (any, all, any_strict, all_strict, exact) with the same semantics as recall.

  • Python
  • Node.js
  • CLI
  • Go
# Filter reflection to only consider memories for a specific user
response = client.reflect(
bank_id="my-bank",
query="What does this user think about our product?",
tags=["user:alice"],
tags_match="any_strict"# Only use memories tagged for this user
)

include​

Controls optional supplementary data returned alongside the main response.

include.facts​

When enabled, the response includes a based_on object listing the memories, mental models, and directives the agent actually used to construct the answer. Only sources retrieved during the agent loop can appear here β€” citations are validated to prevent hallucinated references. Useful for transparency and verification.

  • Python
  • Node.js
  • CLI
  • Go
# include_facts=True enables the based_on field in the response
response = client.reflect(
bank_id="my-bank",
query="Tell me about Alice",
include_facts=True,
)

print("Response:", response.text)
print("\nBased on:")
for fact in(response.based_on.memories if response.based_on else[]):
print(f" - [{fact.type}] {fact.text}")

include.tool_calls​

When enabled, the response includes a trace object with the full execution log of every tool call and LLM call made during the agentic loop, including inputs, outputs, and durations. Set output: false to include only tool inputs for a smaller payload. Useful for debugging why the agent reached a particular conclusion.


Response​

text​

The synthesized answer as a well-formatted markdown string. This is the primary output of reflect. Empty when response_schema is provided (use structured_output instead in that case).

structured_output​

The LLM's response parsed according to the response_schema provided in the request. Only present when response_schema was set. null otherwise.

based_on​

The sources the agent used to construct the answer. Only present when include.facts was enabled. Contains three fields:

  • memories β€” a list of memory facts (world, experience, observation) that were retrieved and cited. Each item has id, text, type, context, occurred_start, and occurred_end.
  • mental_models β€” a list of mental models that were used. Each item has id, text, and context.
  • directives β€” a list of directives that were enforced during reasoning. Each item has id, name, and content.

usage​

Token usage for all LLM calls made during the agentic loop: input_tokens, output_tokens, and total_tokens. Useful for cost tracking.

trace​

The full execution log of the agentic loop. Only present when include.tool_calls was enabled. Contains:

  • tool_calls β€” each tool invocation with tool name (lookup, recall, learn, expand), input, output (if output: true), duration_ms, and iteration number.
  • llm_calls β€” each LLM call with scope (e.g., "agent_1", "final") and duration_ms.

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /