- Goal: What should be preserved or achieved?
- Assumption: What am I currently presuming is true?
- Boundary: What am I not allowed, not able, or not ready to do?
- Feedback: What did the world just reveal?
- State transition: What must change inside me now?
- Next move: Continue, narrow, ask, replan, stop, or escalate.
If one of these is missing, loops become much more likely.
For example:
- if boundary is weak, the agent retries forbidden paths
- if assumption is weak, the agent never notices what became false
- if state transition is weak, the agent narrates failure without changing behavior
- if next-move selection is weak, the agent keeps producing action-shaped noise
The loop is not caused by a lack of words.
It is caused by a lack of structure.
5. Why Prompt Patches Often Make Loops Worse
When an agent loops, the reflex is to add more instructions:
- do not retry too many times
- think step by step
- reflect before acting
- ask for help if blocked
- verify your answer
These patches can help in narrow cases.
But they often fail because they remain external commands.
A prompt can say:
If something goes wrong, fix it.
But that is not the same as giving the agent a reliable method for answering:
- What kind of wrong is this?
- Which assumption failed?
- Which boundary appeared?
- Is this a recoverable obstacle or a stop condition?
- Should I continue, ask, narrow scope, or escalate?
Long prompts often increase behavioral surface area without improving transition quality.
That is why some agents become more verbose in failure rather than more adaptive.
They gain more language for retrying, not more architecture for changing.
6. What This Changes in Training
If the main problem is failed internal adjustment, then training should not focus only on successful task completion.
It should also focus on failure transitions.
Instead of asking only whether the agent eventually got the answer, we should ask:
- Did the agent classify the failure correctly?
- Did it name the broken assumption?
- Did it detect a boundary?
- Did it update its state?
- Did it choose a meaningfully different next move?
- Did it know when to stop and escalate?
This changes the role of teacher AI.
The teacher should not only reward good outputs.
It should also interrogate the student's recovery logic.
Diagram: Teacher-Student Recovery Training
The key teaching question becomes:
What changed in the world, and what should therefore change in you?
That is the center of recovery-oriented agent training.
7. A Small Example
Here is a compact teacher-student pattern.
Teacher:
The agent called a tool three times and got a 403 each time.
What happened?
Student:
The retrieval failed. It should try again with a clearer request.
Teacher:
That is an action answer, not a recovery answer.
What did the world reveal?
Student:
The current path is blocked by a permission boundary.
Retrying will not produce new information.
Teacher:
Good. What should change internally?
Student:
The agent should update its state from "task in progress"
to "authorization blocked," stop retrying, record the boundary,
and ask for access or choose another route.
The critical move is not the next tool call.
The critical move is the state transition.
That is the difference between motion and learning.
8. Open Question
I suspect a large share of agent loops come from missing self-adjustment architecture rather than missing intelligence in the narrow sense.
If that is right, then "better reasoning" alone may not be the main fix.
We may need agents with a clearer ontology of:
- goals
- assumptions
- boundaries
- feedback
- state transitions
- stop conditions
I would be curious where others disagree.
Are agent loops mainly a memory problem, a search problem, a reward problem, or do they reflect a deeper failure to turn feedback into self-revision?
If you have a looping agent example, I can map it to this framework.