Bedrock Inference Profiles — From Flying Blind to Understanding Your AWS Bedrock Usage in Detail

DEV Community

#	IAM Caller	Department	Model	Requests	Input Tokens	Output Tokens	Est. Cost
1	sgomez	Digital Banking	OPUS	64,736	8,747,730	39,697,047	1,007ドル.47
2	vrodriguez	Core Banking	OPUS	56,296	9,829,576	21,772,001	528ドル.79
3	smartinez	Fraud & Risk	OPUS	25,166	13,829,878	8,723,278	249ドル.95
4	cperez	Payments & Transfers	OPUS	13,459	2,476,498	8,256,859	216ドル.67
5	agarcia	Mobile Banking	OPUS	14,247	6,575,622

The solution: three layers

For the impatient builders: GitHub Code

Layer 1 — Capture every invocation

This is the foundation. You tell Bedrock to log every API call to two places:

S3 (invocations/ prefix) — durable, cheap, queryable with Athena
CloudWatch (/aws/bedrock/invocations) — for real-time tailing and alerting

Now you can see in CloudWatch each request and response, including the tokens used. Without this step, everything else is blind. Run it once per account/region.

Layer 2 — Tag every invocation with an identity

This is the key insight. Logging alone tells you that a call happened and how many tokens it used — but not who made it from an application perspective. All calls to the same model look identical in the logs.

Application inference profiles solve this. Each app gets a profile that is a named copy of a system model, carrying tags:

tags: { app: "community-bank", team: "cto" }

The app swaps its modelId for the profile ARN. That's the only change required — no code changes, just configuration. The profile ARN flows into every log entry, so every token is now stamped with an app and team identity.

That's all you need. From here you can go to Athena and start answering your questions.

Layer 3 — Query the data with Athena

Once logs are flowing with identity stamps, Athena turns your S3 bucket into a queryable warehouse:

Tokens per app per day
Estimated cost per app in USD
Spend per IAM caller — catches developers calling Bedrock directly from their laptops, not through a profile

Bonus: cross-region inference and data residency

There's one more concept worth understanding before you design your profiles. The source model ID used to create a profile has a geographic prefix:

us.anthropic.claude-haiku-4-5-20251001-v1:0
eu.anthropic.claude-haiku-4-5-20251001-v1:0
ap.anthropic.claude-haiku-4-5-20251001-v1:0

That prefix is not cosmetic. Bedrock has three geographic routing pools and when you copy from one of these system profiles, your application profile inherits that routing — meaning Bedrock automatically distributes traffic across regions within that pool for higher availability and better throughput.

Prefix	Pool	Use case
`us.`	US cross-region	Production apps, US data
`eu.`	EU cross-region	GDPR, EU data residency
`ap.`	AP cross-region	Asia-Pacific latency

If you have GDPR obligations or customers in Europe, source your profiles from eu. and data never leaves EU regions. This turns inference profiles into a data governance tool, not just a cost governance tool.

The governance arc

Before — bill arrives, no idea who spent what
Enable logging — raw data flows, but it's all ARNs and roles, still hard to read
Add profiles — one config change per app unlocks full attribution, no code changes
Athena — token-level drill-down, estimated USD per app/day, per IAM caller
Cost Explorer — activate the app/team tags for budget-level visibility and alerts

From nothing to full observability. That's the journey.

Implementation Guide

Technical setup for Bedrock cost tracking using application inference profiles and invocation logging.

Prerequisites

AWS CLI configured with permissions for IAM, Bedrock, S3, CloudWatch Logs
An S3 bucket for invocation logs
Region: us-east-1 (or override via AWS_REGION)

Architecture

bedrock-cost-tracking/
├── 01-enable-logging.sh # Step 1: enable invocation logging (run once per account)
├── 02-create-inference-profiles.sh # Step 2: create per-app/team profiles
├── 03-validate.sh # Step 3: verify the full pipeline
├── invoke_profiles.py # Fire test calls through each app profile
├── check_logs.py # Tail CloudWatch logs and summarise usage
├── trust-policy.json # IAM trust policy for BedrockInvocationLoggingRole
├── permissions-policy.json # IAM permissions for the logging role
└── athena-usage-query.sql # Token-level usage + estimated cost per app/day

Setup

Step 1 — Enable invocation logging (once per account/region)

export BEDROCK_LOG_BUCKET=your-bedrock-logs-bucket
bash 01-enable-logging.sh

This creates:

CloudWatch log group /aws/bedrock/invocations (90-day retention)
IAM role BedrockInvocationLoggingRole with S3 + CloudWatch write permissions
Bedrock model invocation logging configuration pointing at both destinations

Step 2 — Create application inference profiles

bash 02-create-inference-profiles.sh

Each profile is tagged with app and team. The script prints the ARN for each created profile — copy these into invoke_profiles.py and athena-usage-query.sql.

Step 3 — Validate

# Set profile ARNs from step 2 output
export PROFILE_ALPHA=arn:aws:bedrock:us-east-1:<YOUR_ACCOUNT_ID>:application-inference-profile/<ID>
export PROFILE_BETA=arn:aws:bedrock:us-east-1:<YOUR_ACCOUNT_ID>:application-inference-profile/<ID>
export PROFILE_AIHUB=arn:aws:bedrock:us-east-1:<YOUR_ACCOUNT_ID>:application-inference-profile/<ID>
python3 invoke_profiles.py # fire one test call per profile
# wait ~90 seconds for logs to appear
python3 check_logs.py # confirm attribution is working
bash 03-validate.sh # full infrastructure check

Add a new app profile

1. Find the source model ARN

aws bedrock list-inference-profiles --region us-east-1 --type-equals SYSTEM_DEFINED \
 --query 'inferenceProfileSummaries[?contains(inferenceProfileId, `haiku`) == `true`].{ID:inferenceProfileId,ARN:inferenceProfileArn}' \
 --output table

Use the full ARN from this output — the copyFrom field requires it.

2. Create the profile

Replace app-gamma, gamma, and data-science with your app name and team.

SOURCE_ARN="arn:aws:bedrock:us-east-1:<YOUR_ACCOUNT_ID>:inference-profile/us.anthropic.claude-haiku-4-5-20251001-v1:0"
aws bedrock create-inference-profile \
 --region us-east-1 \
 --inference-profile-name "app-gamma-claude-haiku" \
 --description "Gamma app profile for Claude Haiku 4.5" \
 --model-source "{\"copyFrom\":\"$SOURCE_ARN\"}" \
 --tags "[{\"key\":\"app\",\"value\":\"gamma\"},{\"key\":\"team\",\"value\":\"data-science\"}]" \
 --query '{ARN:inferenceProfileArn,Status:status}' \
 --output table

3. Add the new ARN to athena-usage-query.sql — add a row to the pricing CTE with the ARN, app label, and token prices.

4. Run a validation call

python3 invoke_profiles.py # add the new profile to APP_PROFILES first
python3 check_logs.py # confirm it shows up attributed correctly

Migrate an existing app (one config change, no code changes)

Apps pass the profile ARN as modelId — the API call shape is identical to a direct model call:

# Before: direct model call (no attribution)
BEDROCK_MODEL_ID=us.anthropic.claude-haiku-4-5-20251001-v1:0
# After: routed through profile (tagged in logs and Cost Explorer)
BEDROCK_MODEL_ID=arn:aws:bedrock:us-east-1:<YOUR_ACCOUNT_ID>:application-inference-profile/<PROFILE_ID>

No SDK changes. The response shape is identical.

Cost attribution

Cost Explorer (budget-level)

Activate cost allocation tags in AWS Billing console: app, team
Cost Explorer → Group by tag → filter by app or team
Set budget alerts per tag value to catch spend anomalies early

Athena (token-level)

Run athena-usage-query.sql against your S3 log bucket for daily token counts and estimated USD cost per app profile. The query file contains five progressive steps:

Query	What it shows
Step 1	Create the Athena table over S3 logs
Step 2	Daily usage by profile ARN (tokens + requests)
Step 3	Estimated cost per app using a pricing CTE
Step 4	Per IAM caller — tracks developer/role-level spend
Step 5	Combined view: user + app + estimated cost in one query

Pricing reference (us-east-1, on-demand)

Model	Input per 1K tokens	Output per 1K tokens
Claude Haiku 4.5	0ドル.00080	0ドル.00400
Claude Sonnet 4.5	0ドル.00300	0ドル.01500
Claude Opus 4	0ドル.01500	0ドル.07500

IAM-level attribution

Invocation logs capture the caller's IAM ARN (identity.arn). This enables per-developer or per-role spend queries — useful for tracking AI coding assistant usage separately from production workloads without needing a separate profile per developer.

Key concepts

System inference profiles are AWS-managed cross-region routing profiles (e.g. us.anthropic.claude-haiku-4-5-20251001-v1:0). They route to the best available region automatically for resilience.

Application inference profiles are account-owned copies of a system profile that add tagging metadata. They are the attribution layer — there is no routing or model-behavior difference.

Invocation logging captures every request/response at the Bedrock service level, including token counts, model ID (which resolves to the profile ARN when profiles are used), and the IAM identity of the caller.

There has never been a better time to be an engineer and create value in society through software.

If you enjoyed the articles, visit my blog at jorgetovar.dev.

Top comments (1)

topstar_ai profile image

Luis

Automation-focused AI Developer specializing in production LLM agent systems — tool-calling agents, multi-step orchestration, and RAG pipelines over vector databases

Email

stackbuilder1228@gmail.com
Education

National Autonomous University of Mexico
Pronouns

10+ years full-stack, the last several building agentic AI wired directly into live systems
Joined

May 7, 2026

• Jul 2

This is a really useful breakdown of something that’s often confusing in Bedrock—how inference profiles actually shape cost visibility and model usage tracking. The "flying blind" analogy is accurate; without proper observability, it’s very easy to lose track of which models are driving spend and latency. I also like the focus on bringing structure to usage patterns, since most teams only notice inefficiencies after bills spike. One thing I’d be curious about is how inference profiles behave in multi-environment setups (dev/staging/prod) and whether they can be reliably used for governance at scale. Overall, a practical guide for real-world Bedrock usage.