https://bedrock-agentcore.eu-west-1.amazonaws.com/runtimes/<arn>/invocations
This URL is a special public-facing endpoint that AgentCore Runtime exposes. We specify this in the agentcore.json file:
"runtimes":[{"name":"AWSBriefingAgent","build":"Container","entrypoint":"handler.py",..."authorizerType":"CUSTOM_JWT","authorizerConfiguration":{"customJwtAuthorizer":{"discoveryUrl":"https://cognito-idp.eu-west-1.amazonaws.com/eu-west-1_dshjdhskj/.well-known/openid-configuration","allowedClients":["dhjhdjskhdjkshdjkhsd"]}}}
The discoveryUrl points to Cognito's OpenID Connect discovery document for the AWS Cognito User Pool with the specified ID that is being used to authenticate users to the frontend. When AgentCore Runtime wants to validate the JWT token, it retrieves information from this endpoint such as the issuer and JWKS endpoint (contains the public keys used to verify the JWT signature).
The allowedClients shows the Cognito Application Client ID. When a user logs in, Cognito stamps the token with the client_id. AgentCore validates the JWT’s client_id claim, so only tokens issued for one of the permitted application clients can invoke the runtime.
When the user logs into our frontend application with their email address and password, the frontend calls Cognito directly to verify, and receives back
-
Access token — proves who you are and what you're allowed to do.
-
ID token — contains profile info (email, name). Used by the frontend to display the username.
-
Refresh token — used to get new access/ID tokens when they expire (usually after 1 hour).
These tokens are stored by the frontend auth library.
When we send a request to the agent, the frontend attaches the access token as a bearer token
POST /invocations
Authorization: Bearer eyJraWQi...
Body: {"prompt": "Give me a briefing"}
This is the JWT token that gets validated by AgentCore Runtime.
Returning Memory records in Handler function
The following code snippet shows how we retrieve the memory records to display in the sidebar of the frontend.
@app.entrypoint
async def invoke(payload: Dict[str, Any], context: Any = None):
message = payload.get("prompt", payload.get("message", ""))
# Derive actor_id from the JWT 'sub' claim (source of truth)
actor_id = _extract_sub_from_jwt(context) or payload.get("user_id", "default-user")
# Sanitize actor_id for AgentCore Memory
actor_id = re.sub(r"[^a-zA-Z0-9\-_/]", "_", actor_id)
# Retrieve memory records to include in the stream
memory_used = get_memory_records(actor_id, message)
The @app.entrypoint decorator registers a function as the handler for POST requests to /invocations. AgentCore Runtime calls this handler function when a client invokes the agent. Our handler function is an async generator, which means that it automatically streams the response as Server-Sent Events (SSE) delivered to the client in real-time (more around this in the next blog post).
Within the handler, we get the message that has been sent in the payload. We then extract the user's identity from the JWT token that Cognito issued. One of the claims in the JWT token is the sub or subject, which is the unique user ID assigned by Cognito to a user when they first register. We know that the JWT token has been cryptographically signed by Cognito and validated by AgentCore Runtime before it reaches the handler function. We assign this sub value to be the actor_id. We apply some regex to the actual value to ensure it has no characters in it that are not supported.
We then call our get_memory_records function. This function calls the AgentCore retrieve memory records API to search the long-term memory for facts and preferences relevant to the promt that has just been passed in. We retrieve the 5 highest scoring results from the vector search and store them in a records array, which is streamed back to the frontend to be displayed in the sidebar.
def get_memory_records(actor_id: str, prompt: str) -> List[Dict[str, Any]]:
"""Retrieve long-term memory records relevant to the user's prompt.
Searches both the facts and preferences namespaces and returns
the records the agent would have seen for this invocation.
"""
if not MEMORY_ID:
return []
try:
client = boto3.client("bedrock-agentcore", region_name=REGION)
records = []
for namespace in [
f"users/{actor_id}/facts",
f"users/{actor_id}/preferences",
]:
try:
response = client.retrieve_memory_records(
memoryId=MEMORY_ID,
namespace=namespace,
searchCriteria={
"searchQuery": prompt,
"topK": 5,
},
maxResults=5,
)
for r in response.get("memoryRecordSummaries", []):
records.append({
"memoryRecordId": r.get("memoryRecordId", ""),
"text": r.get("content", {}).get("text", ""),
"score": r.get("score"),
"memoryStrategyId": r.get("memoryStrategyId", ""),
"namespaces": r.get("namespaces", []),
})
except Exception as exc:
logger.warning("Failed to retrieve from %s: %s", namespace, exc)
return records
except Exception as exc:
logger.warning("Failed to retrieve memory records: %s", exc)
return []
We can see an example of the sidebar in the frontend below:
Memory Sidebar
Setting up Memory with Strands
Both short-term and long-term memory are handled for us automatically through the AgentCore Memory session manager integration for Strands.
The memory ID is retrieved in a module-level constant:
MEMORY_ID = os.environ.get("MEMORY_BRIEFINGAGENTMEMORY_ID")
This reads the memory resource ID that AgentCore Runtime automatically injects as an environment variable into your container at runtime. The naming convention is: MEMORY__ID. Given the memory was given a name of "BriefingAgentMemory" in the agentcore.json file, AgentCore sets MEMORY_BRIEFINGAGENTMEMORY_ID to the actual memory resource ID (something like AWSBriefingAgent_BriefingAgentMemory-q2iBfL64BS).
The following function in our code is called on every request. A new stateless Strands Agent instance is created on each invocation, configured with the relevant session manager that loads conversation history from AgentCore Memory, tools and model settings.
def _create_agent(session_id: str, actor_id: str, gateway_tools: list = None) -> Agent:
"""Create a Strands Agent with KB retrieval, AgentCore Memory, and Gateway tools."""
session_manager = None
if MEMORY_ID:
try:
from bedrock_agentcore.memory.integrations.strands.config import (
AgentCoreMemoryConfig,
RetrievalConfig,
)
from bedrock_agentcore.memory.integrations.strands.session_manager import (
AgentCoreMemorySessionManager,
)
config = AgentCoreMemoryConfig(
memory_id=MEMORY_ID,
session_id=session_id,
actor_id=actor_id,
retrieval_config={
f"users/{actor_id}/facts": RetrievalConfig(
top_k=5, relevance_score=0.5
),
f"users/{actor_id}/preferences": RetrievalConfig(
top_k=5, relevance_score=0.5
),
},
)
session_manager = AgentCoreMemorySessionManager(
agentcore_memory_config=config,
region_name=REGION,
)
except Exception as exc:
logger.warning("Failed to initialise memory session manager: %s", exc)
tools = [retrieve, format_slack_message] + (gateway_tools or [])
return Agent(
system_prompt=_load_system_prompt(),
model=_create_model(),
tools=tools,
session_manager=session_manager,
conversation_manager=SlidingWindowConversationManager(
window_size=20,
should_truncate_results=True,
per_turn=True,
),
callback_handler=None,
)
In our code, if memory has been set, then we import the AgentCoreMemorySessionManager. This session manager integrates Strands agents with AgentCore Memory, which synchronises the short-term and long-term memory capabilities. Some of its features include loading the conversation history from short-term memory during agent initialisation, and integrating with long-term memory for context injection into agent state.
Next we create a AgentCoreMemoryConfig configuration object which will be passed to the session manager telling it:
- memory_id - which AgentCore Memory resource to connect to
- session_id - the identifier for the conversation session
- actor_id - the unique identifier for the user
- retrieval_config - a dictionary mapping of namespaces to retrieval configurations. This tells the session manage to search the two namespaces for relevant long-term memories, and to get the 5 most relevant facts and user preferences
Our use of AgentCore Memory is now handled automatically by Strands Agents session manager. Before each turn, it will load recent events from the same session to populate the agent's conversation context. The short-term memory is the raw event stream. The agent will see the last 20 turns in its context window, as this has been configured with the Sliding Window Conversation Manager. After (and during) invocations of the agent, new conversation messages are automatically persisted to AgentCore Memory.
With this in place, we have now successfully added long-term memory to our agent, personalising the briefing for each user based on their preferences.
Biography
As Chief AWS Architect at IBM in the UK, I am responsible for growing the AWS capability and community within one of the fastest growing AWS consulting partners globally. This gives me the opportunity to try out the latest features in preview before they go into general availability. You'll often find me blogging about my experience, but please reach out if there are services you'd like to know more about.