| # |
IAM Caller |
Department |
Model |
Requests |
Input Tokens |
Output Tokens |
Est. Cost |
| 1 |
sgomez |
Digital Banking |
OPUS |
64,736 |
8,747,730 |
39,697,047 |
1,007ドル.47 |
| 2 |
vrodriguez |
Core Banking |
OPUS |
56,296 |
9,829,576 |
21,772,001 |
528ドル.79 |
| 3 |
smartinez |
Fraud & Risk |
OPUS |
25,166 |
13,829,878 |
8,723,278 |
249ドル.95 |
| 4 |
cperez |
Payments & Transfers |
OPUS |
13,459 |
2,476,498 |
8,256,859 |
216ドル.67 |
| 5 |
agarcia |
Mobile Banking |
OPUS |
14,247 |
6,575,622 |
The solution: three layers
For the impatient builders: GitHub Code
Layer 1 — Capture every invocation
This is the foundation. You tell Bedrock to log every API call to two places:
-
S3 (
invocations/ prefix) — durable, cheap, queryable with Athena
-
CloudWatch (
/aws/bedrock/invocations) — for real-time tailing and alerting
Now you can see in CloudWatch each request and response, including the tokens used. Without this step, everything else is blind. Run it once per account/region.
Layer 2 — Tag every invocation with an identity
This is the key insight. Logging alone tells you that a call happened and how many tokens it used — but not who made it from an application perspective. All calls to the same model look identical in the logs.
Application inference profiles solve this. Each app gets a profile that is a named copy of a system model, carrying tags:
tags: { app: "community-bank", team: "cto" }
The app swaps its modelId for the profile ARN. That's the only change required — no code changes, just configuration. The profile ARN flows into every log entry, so every token is now stamped with an app and team identity.
That's all you need. From here you can go to Athena and start answering your questions.
Layer 3 — Query the data with Athena
Once logs are flowing with identity stamps, Athena turns your S3 bucket into a queryable warehouse:
- Tokens per app per day
- Estimated cost per app in USD
- Spend per IAM caller — catches developers calling Bedrock directly from their laptops, not through a profile
Bonus: cross-region inference and data residency
There's one more concept worth understanding before you design your profiles. The source model ID used to create a profile has a geographic prefix:
us.anthropic.claude-haiku-4-5-20251001-v1:0
eu.anthropic.claude-haiku-4-5-20251001-v1:0
ap.anthropic.claude-haiku-4-5-20251001-v1:0
That prefix is not cosmetic. Bedrock has three geographic routing pools and when you copy from one of these system profiles, your application profile inherits that routing — meaning Bedrock automatically distributes traffic across regions within that pool for higher availability and better throughput.
| Prefix |
Pool |
Use case |
us. |
US cross-region |
Production apps, US data |
eu. |
EU cross-region |
GDPR, EU data residency |
ap. |
AP cross-region |
Asia-Pacific latency |
If you have GDPR obligations or customers in Europe, source your profiles from eu. and data never leaves EU regions. This turns inference profiles into a data governance tool, not just a cost governance tool.
The governance arc
-
Before — bill arrives, no idea who spent what
-
Enable logging — raw data flows, but it's all ARNs and roles, still hard to read
-
Add profiles — one config change per app unlocks full attribution, no code changes
-
Athena — token-level drill-down, estimated USD per app/day, per IAM caller
-
Cost Explorer — activate the
app/team tags for budget-level visibility and alerts
From nothing to full observability. That's the journey.
Implementation Guide
Technical setup for Bedrock cost tracking using application inference profiles and invocation logging.
Prerequisites
- AWS CLI configured with permissions for IAM, Bedrock, S3, CloudWatch Logs
- An S3 bucket for invocation logs
- Region:
us-east-1 (or override via AWS_REGION)
Architecture
bedrock-cost-tracking/
├── 01-enable-logging.sh # Step 1: enable invocation logging (run once per account)
├── 02-create-inference-profiles.sh # Step 2: create per-app/team profiles
├── 03-validate.sh # Step 3: verify the full pipeline
├── invoke_profiles.py # Fire test calls through each app profile
├── check_logs.py # Tail CloudWatch logs and summarise usage
├── trust-policy.json # IAM trust policy for BedrockInvocationLoggingRole
├── permissions-policy.json # IAM permissions for the logging role
└── athena-usage-query.sql # Token-level usage + estimated cost per app/day
Setup
Step 1 — Enable invocation logging (once per account/region)
export BEDROCK_LOG_BUCKET=your-bedrock-logs-bucket
bash 01-enable-logging.sh
This creates:
- CloudWatch log group
/aws/bedrock/invocations (90-day retention)
- IAM role
BedrockInvocationLoggingRole with S3 + CloudWatch write permissions
- Bedrock model invocation logging configuration pointing at both destinations
Step 2 — Create application inference profiles
bash 02-create-inference-profiles.sh
Each profile is tagged with app and team. The script prints the ARN for each created profile — copy these into invoke_profiles.py and athena-usage-query.sql.
Step 3 — Validate
# Set profile ARNs from step 2 output
export PROFILE_ALPHA=arn:aws:bedrock:us-east-1:<YOUR_ACCOUNT_ID>:application-inference-profile/<ID>
export PROFILE_BETA=arn:aws:bedrock:us-east-1:<YOUR_ACCOUNT_ID>:application-inference-profile/<ID>
export PROFILE_AIHUB=arn:aws:bedrock:us-east-1:<YOUR_ACCOUNT_ID>:application-inference-profile/<ID>
python3 invoke_profiles.py # fire one test call per profile
# wait ~90 seconds for logs to appear
python3 check_logs.py # confirm attribution is working
bash 03-validate.sh # full infrastructure check
Add a new app profile
1. Find the source model ARN
aws bedrock list-inference-profiles --region us-east-1 --type-equals SYSTEM_DEFINED \
--query 'inferenceProfileSummaries[?contains(inferenceProfileId, `haiku`) == `true`].{ID:inferenceProfileId,ARN:inferenceProfileArn}' \
--output table
Use the full ARN from this output — the copyFrom field requires it.
2. Create the profile
Replace app-gamma, gamma, and data-science with your app name and team.
SOURCE_ARN="arn:aws:bedrock:us-east-1:<YOUR_ACCOUNT_ID>:inference-profile/us.anthropic.claude-haiku-4-5-20251001-v1:0"
aws bedrock create-inference-profile \
--region us-east-1 \
--inference-profile-name "app-gamma-claude-haiku" \
--description "Gamma app profile for Claude Haiku 4.5" \
--model-source "{\"copyFrom\":\"$SOURCE_ARN\"}" \
--tags "[{\"key\":\"app\",\"value\":\"gamma\"},{\"key\":\"team\",\"value\":\"data-science\"}]" \
--query '{ARN:inferenceProfileArn,Status:status}' \
--output table
3. Add the new ARN to athena-usage-query.sql — add a row to the pricing CTE with the ARN, app label, and token prices.
4. Run a validation call
python3 invoke_profiles.py # add the new profile to APP_PROFILES first
python3 check_logs.py # confirm it shows up attributed correctly
Migrate an existing app (one config change, no code changes)
Apps pass the profile ARN as modelId — the API call shape is identical to a direct model call:
# Before: direct model call (no attribution)
BEDROCK_MODEL_ID=us.anthropic.claude-haiku-4-5-20251001-v1:0
# After: routed through profile (tagged in logs and Cost Explorer)
BEDROCK_MODEL_ID=arn:aws:bedrock:us-east-1:<YOUR_ACCOUNT_ID>:application-inference-profile/<PROFILE_ID>
No SDK changes. The response shape is identical.
Cost attribution
Cost Explorer (budget-level)
- Activate cost allocation tags in AWS Billing console:
app, team
- Cost Explorer → Group by tag → filter by
app or team
- Set budget alerts per tag value to catch spend anomalies early
Athena (token-level)
Run athena-usage-query.sql against your S3 log bucket for daily token counts and estimated USD cost per app profile. The query file contains five progressive steps:
| Query |
What it shows |
| Step 1 |
Create the Athena table over S3 logs |
| Step 2 |
Daily usage by profile ARN (tokens + requests) |
| Step 3 |
Estimated cost per app using a pricing CTE |
| Step 4 |
Per IAM caller — tracks developer/role-level spend |
| Step 5 |
Combined view: user + app + estimated cost in one query |
Pricing reference (us-east-1, on-demand)
| Model |
Input per 1K tokens |
Output per 1K tokens |
| Claude Haiku 4.5 |
0ドル.00080 |
0ドル.00400 |
| Claude Sonnet 4.5 |
0ドル.00300 |
0ドル.01500 |
| Claude Opus 4 |
0ドル.01500 |
0ドル.07500 |
IAM-level attribution
Invocation logs capture the caller's IAM ARN (identity.arn). This enables per-developer or per-role spend queries — useful for tracking AI coding assistant usage separately from production workloads without needing a separate profile per developer.
Key concepts
System inference profiles are AWS-managed cross-region routing profiles (e.g. us.anthropic.claude-haiku-4-5-20251001-v1:0). They route to the best available region automatically for resilience.
Application inference profiles are account-owned copies of a system profile that add tagging metadata. They are the attribution layer — there is no routing or model-behavior difference.
Invocation logging captures every request/response at the Bedrock service level, including token counts, model ID (which resolves to the profile ARN when profiles are used), and the IAM identity of the caller.
There has never been a better time to be an engineer and create value in society through software.
If you enjoyed the articles, visit my blog at jorgetovar.dev.