Lambda Versions and Aliases: The Foundation
Before traffic routing makes sense, you need to understand Lambda's versioning model.
Versions
Every time you publish a Lambda function, AWS creates an immutable version — a snapshot of your code and configuration at that point in time.
$LATEST → always points to the latest unpublished code (mutable)
:1 → first published version (immutable)
:2 → second published version (immutable)
:3 → third published version (immutable)
# Publish a new version via boto3
import boto3
lambda_client = boto3.client('lambda')
response = lambda_client.publish_version(
FunctionName='brand-api',
Description='v2.1.0 — faster logo lookup with DynamoDB cache'
)
version_arn = response['FunctionArn']
version_number = response['Version']
print(f'Published version {version_number}: {version_arn}')
# → Published version 42: arn:aws:lambda:us-east-1:123:function:brand-api:42
Versions are immutable — you cannot change the code of :42 after it's published. This is the foundation of safe deployments.
Aliases
An alias is a named pointer to a specific version. Your API Gateway, EventBridge rules, and other triggers should always point to an alias — never to $LATEST or a version number directly.
brand-api:prod → points to :42 (production traffic)
brand-api:staging → points to :43 (staging traffic)
brand-api:canary → points to :42 (95%) + :43 (5%) ← weighted routing
# Create or update an alias
lambda_client.create_alias(
FunctionName='brand-api',
Name='prod',
FunctionVersion='42',
Description='Production alias'
)
# Update alias to point to new version
lambda_client.update_alias(
FunctionName='brand-api',
Name='prod',
FunctionVersion='43'
)
Traffic Splitting: Canary Deployments with Weighted Aliases
The most powerful traffic routing feature in Lambda is weighted aliases — you can split traffic between two versions with any percentage split.
brand-api:prod
├── version :42 → 95% of traffic
└── version :43 → 5% of traffic ← canary
This is Lambda's equivalent of what Knative achieves with Istio VirtualService traffic splitting — but built natively into the Lambda service.
Implementing a Canary Deployment
# deploy_canary.py
import boto3
import time
lambda_client = boto3.client('lambda')
cloudwatch = boto3.client('cloudwatch')
def deploy_canary(function_name: str, new_version: str, canary_percent: int = 5):
"""
Deploy a new Lambda version as a canary.
Routes canary_percent% of traffic to new version.
"""
# Get current prod alias
alias = lambda_client.get_alias(
FunctionName=function_name,
Name='prod'
)
current_version = alias['FunctionVersion']
print(f'Current prod version: {current_version}')
print(f'Deploying canary: version {new_version} at {canary_percent}%')
# Update alias with weighted routing
lambda_client.update_alias(
FunctionName=function_name,
Name='prod',
FunctionVersion=current_version, # stable version gets majority
RoutingConfig={
'AdditionalVersionWeights': {
new_version: canary_percent / 100 # e.g., 0.05 = 5%
}
}
)
print(f'Canary deployed: {100 - canary_percent}% → v{current_version}, '
f'{canary_percent}% → v{new_version}')
def promote_canary(function_name: str, new_version: str):
"""Promote canary to 100% — full deployment"""
lambda_client.update_alias(
FunctionName=function_name,
Name='prod',
FunctionVersion=new_version,
RoutingConfig={
'AdditionalVersionWeights': {} # clear weighted routing
}
)
print(f'Canary promoted: 100% traffic now on version {new_version}')
def rollback_canary(function_name: str, stable_version: str):
"""Roll back — remove canary, restore 100% to stable version"""
lambda_client.update_alias(
FunctionName=function_name,
Name='prod',
FunctionVersion=stable_version,
RoutingConfig={
'AdditionalVersionWeights': {} # clear canary
}
)
print(f'Rolled back: 100% traffic restored to version {stable_version}')
# Usage
deploy_canary('brand-api', new_version='43', canary_percent=5)
Automated Canary with CloudWatch Alarms (CodeDeploy)
Manually managing canary percentages is error-prone. AWS CodeDeploy integrates with Lambda to automate the shift — and automatically roll back if CloudWatch alarms fire.
# serverless.yml — automated canary deployment
provider:
name: aws
deploymentMethod: direct
functions:
brandApi:
handler: handler.handler
deploymentSettings:
type: Canary10Percent5Minutes # shift 10% now, 100% after 5 minutes
alias: prod
alarms:
- BrandApiErrorRateAlarm # rollback if this alarm fires
- BrandApiLatencyAlarm
# CloudFormation — define the rollback alarms
resources:
Resources:
BrandApiErrorRateAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: brand-api-error-rate-canary
MetricName: Errors
Namespace: AWS/Lambda
Dimensions:
- Name: FunctionName
Value: brand-api
- Name: Resource
Value: brand-api:prod # monitor the alias, not a specific version
Statistic: Sum
Period: 60
EvaluationPeriods: 2
Threshold: 5 # rollback if >5 errors in 2 minutes
ComparisonOperator: GreaterThanThreshold
BrandApiLatencyAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: brand-api-p99-latency-canary
MetricName: Duration
Namespace: AWS/Lambda
Dimensions:
- Name: FunctionName
Value: brand-api
- Name: Resource
Value: brand-api:prod
ExtendedStatistic: p99
Period: 60
EvaluationPeriods: 2
Threshold: 2000 # rollback if P99 > 2000ms
ComparisonOperator: GreaterThanThreshold
CodeDeploy deployment types for Lambda:
| Type |
Behavior |
AllAtOnce |
100% traffic shifts immediately (no canary) |
Canary10Percent5Minutes |
10% for 5 min, then 100% |
Canary10Percent10Minutes |
10% for 10 min, then 100% |
Canary10Percent15Minutes |
10% for 15 min, then 100% |
Linear10PercentEvery1Minute |
+10% every minute until 100% |
Linear10PercentEvery2Minutes |
+10% every 2 minutes until 100% |
How Traffic Flows: Sync vs Async
Traffic routing in Lambda isn't just about version weights — the entire flow differs between synchronous and asynchronous invocations.
Synchronous Traffic Flow (API Gateway)
Client Request
│
▼
API Gateway
│ (points to alias: brand-api:prod)
▼
Lambda Service (weighted routing)
├── 95% → Execution Environment running v42
└── 5% → Execution Environment running v43
│
▼
Response returned to API Gateway → Client
Key characteristics:
-
Direct path: client waits for the response
-
No buffering: if Lambda is throttled, API Gateway immediately returns
429 to the client
-
Version routing: Lambda's weighted alias determines which version handles each request
# handler.py — use context to log which version is handling the request
import os
def handler(event, context):
# Log version info for canary monitoring
function_version = context.function_version
print(f'Handled by version: {function_version}')
# Your business logic
brand_id = event['pathParameters']['brandId']
return get_brand(brand_id)
Asynchronous Traffic Flow (SQS / EventBridge)
Async traffic introduces a buffer layer between the event source and Lambda execution. This is the key architectural difference.
Event Source (S3 upload / EventBridge rule)
│
▼
Lambda Internal Queue ← traffic is buffered here
│
▼ (Lambda polls the queue)
Lambda Service (weighted routing)
├── 95% → Execution Environment running v42
└── 5% → Execution Environment running v43
│
▼
Result → CloudWatch Logs
→ Success destination (SNS/SQS/EventBridge/Lambda)
→ Failure destination (DLQ) on repeated failures
The buffer is critical: it decouples the event producer from Lambda's availability. If Lambda is throttled or scaling out, events queue up and are processed when capacity is available — nothing is dropped.
# handler.py — async handler with destination routing
import json
import boto3
def handler(event, context):
"""
Async handler — processes S3 upload events.
On success: result routed to success-destination SQS.
On failure: after 2 retries, routed to DLQ.
"""
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
try:
result = process_brand_asset(bucket, key)
print(f'Successfully processed: {key}')
return {'processed': key, 'result': result}
except Exception as e:
print(f'Failed to process {key}: {e}')
raise # re-raise to trigger Lambda retry + eventual DLQ routing
# serverless.yml — configure async destinations
functions:
processBrandAsset:
handler: handler.handler
destinations:
onSuccess: arn:aws:sqs:us-east-1:123:brand-asset-success
onFailure: arn:aws:sqs:us-east-1:123:brand-asset-dlq
maximumRetryAttempts: 2
events:
- s3:
bucket: brand-assets
event: s3:ObjectCreated:*
Concurrency Control at the Traffic Layer
In Knative's model, the queue-proxy sidecar acts as a per-pod concurrency limiter — it queues excess requests locally before forwarding to the user container, and reports metrics to the autoscaler.
AWS Lambda implements an equivalent mechanism natively, without requiring a sidecar:
Per-Function Concurrency Limiting
# Set maximum concurrency — Lambda queues excess async requests
lambda_client.put_function_concurrency(
FunctionName='brand-logo-processor',
ReservedConcurrentExecutions=50 # max 50 simultaneous executions
)
For synchronous invocations: requests beyond the concurrency limit are immediately throttled (429).
For asynchronous invocations: requests beyond the concurrency limit are queued in Lambda's internal event queue (up to 6 hours) and retried as capacity becomes available.
Per-Alias Concurrency (Provisioned Concurrency on Aliases)
You can apply Provisioned Concurrency specifically to an alias, ensuring the production alias always has warm environments while the canary alias uses on-demand scaling:
# Apply provisioned concurrency to prod alias only
lambda_client.put_provisioned_concurrency_config(
FunctionName='brand-api',
Qualifier='prod', # the alias name
ProvisionedConcurrentExecutions=20
)
# Canary alias uses on-demand (may cold start, but that's acceptable for 5% traffic)
# No provisioned concurrency set on 'canary' alias
Blue/Green Deployment Pattern
For changes that are too risky for gradual canary (e.g., breaking schema changes), use a full blue/green deployment:
Blue environment: brand-api:prod → version :42 (100% traffic)
Green environment: brand-api:green → version :43 (0% traffic, fully tested)
After validation:
Blue environment: brand-api:prod → version :43 (100% traffic, instant cutover)
Green environment: brand-api:green → version :42 (kept for instant rollback)
# blue_green_deploy.py
import boto3
lambda_client = boto3.client('lambda')
def blue_green_cutover(function_name: str, new_version: str):
"""
Instant traffic cutover from current prod version to new version.
Previous version kept on 'previous' alias for instant rollback.
"""
# Get current prod version (this becomes 'blue' / previous)
current = lambda_client.get_alias(
FunctionName=function_name,
Name='prod'
)
current_version = current['FunctionVersion']
# Preserve current version on 'previous' alias for rollback
try:
lambda_client.update_alias(
FunctionName=function_name,
Name='previous',
FunctionVersion=current_version
)
except lambda_client.exceptions.ResourceNotFoundException:
lambda_client.create_alias(
FunctionName=function_name,
Name='previous',
FunctionVersion=current_version
)
# Cut over prod to new version (instant, no gradual shift)
lambda_client.update_alias(
FunctionName=function_name,
Name='prod',
FunctionVersion=new_version,
RoutingConfig={'AdditionalVersionWeights': {}}
)
print(f'Cutover complete: prod now on v{new_version}')
print(f'Rollback available: run rollback() to restore v{current_version}')
def instant_rollback(function_name: str):
"""Roll back to previous version instantly"""
previous = lambda_client.get_alias(
FunctionName=function_name,
Name='previous'
)
previous_version = previous['FunctionVersion']
lambda_client.update_alias(
FunctionName=function_name,
Name='prod',
FunctionVersion=previous_version,
RoutingConfig={'AdditionalVersionWeights': {}}
)
print(f'Rolled back: prod restored to v{previous_version}')
Deployment Strategy Decision Guide
How risky is this deployment?
│
├── Low risk (config change, minor bug fix)
│ └── AllAtOnce — deploy directly to 100%
│
├── Medium risk (new feature, refactor)
│ └── Canary — start at 5–10%, monitor errors/latency,
│ auto-promote or rollback via CodeDeploy alarms
│
├── High risk (breaking change, new external dependency)
│ └── Blue/Green — full parallel environment,
│ instant cutover after validation, instant rollback
│
└── Schema/data migration (irreversible changes)
└── Feature flags in code + gradual rollout
(decouple deployment from feature activation)
Summary
| Concept |
AWS Lambda Implementation |
| Traffic splitting |
Weighted aliases (e.g., 95% v42 / 5% v43) |
| Canary deployment |
CodeDeploy + Lambda aliases + CloudWatch alarms |
| Blue/Green |
Two aliases pointing to different versions, instant cutover |
| Async traffic buffering |
Lambda internal event queue (up to 6 hours) |
| Concurrency control |
Reserved concurrency + Provisioned Concurrency per alias |
| Automatic rollback |
CodeDeploy monitors alarms, rolls back if threshold breached |
The key insight: Lambda's alias + versioning system is its traffic routing layer. Every production Lambda function should be invoked via an alias — never via $LATEST. This single practice unlocks canary deployments, blue/green releases, and instant rollbacks.
Next in this series: **Part 5 — Event-Driven Automation: Building a Serverless Maintenance Bot with Lambda & EventBridge**