Copied to Clipboard
SageMaker endpoint + Lambda proxy:
# Configure the SageMaker endpoint
endpoint_config = SageMaker.CfnEndpointConfig(
self,
"AudioPredictorEndpointConfig",
production_variants=[
SageMaker.CfnEndpointConfig.ProductionVariantProperty(
variant_name="AllTraffic",
model_name=model.attr_model_name,
initial_instance_count=1,
instance_type="ml.t2.medium",
initial_variant_weight=1.0,
)
],
)
# Define the endpoint
endpoint = SageMaker.CfnEndpoint(
self,
"AudioPredictorEndpoint",
endpoint_config_name=endpoint_config.attr_endpoint_config_name,
)
# Initialize the SageMaker Lambda proxy
SageMaker_trigger = create_python_function(
scope=self,
function_name="SageMaker-predictor",
handler="SageMaker_trigger_handler.handler",
timeout=Duration.seconds(90),
environment={
"PREDICTIONS_TABLE": predictions_table.table_name,
"PREDICTOR": "SageMaker",
"MODEL_S3_URI": f"s3://{model_asset.s3_bucket_name}/{model_asset.s3_object_key}",
"ENDPOINT_NAME": endpoint.attr_endpoint_name,
},
)
Results
In the following image, you can see the execution duration of all the stacks which were used (note - SnapStart Lambda was ran once before to save the environment and then waited for 10 minutes for the Lambda to have a cold start again):
Latency Comparison
| Method |
Mean Latency |
Median |
Stability (Std) |
| Standard Lambda |
280.72 ms |
127.65 ms |
443.29 ms |
| SnapStart |
178.60 ms |
124.69 ms |
166.35 ms |
| SageMaker |
339.18 ms |
226.91 ms |
350.76 ms |
From the graph, we can see that the Lambdas execute faster than the SageMaker endpoint, staying under the 200ms mark. The circles represent the cold starts, and you can see that the SnapStart Lambda was at least 2x faster than other resources, thanks to SnapStart. SageMaker stack performed the worst, but not by a lot, having the most Lambda invocations just above the 200ms mark and the cold start taking almost 1.4 seconds.
Cost Breakdown
Lambda Cost (per request)
Formula: Cost = Duration ×ばつ Memory ×ばつ 0ドル.0000166667
- Duration: ~200 ms
- Memory: 4 GB
Cost per request: ~0ドル.0000133
Cost per 1M requests: ~13ドル.80
SageMaker Cost (fixed)
- ml.t4g.medium ≈ 24ドル–30/month
- Runs 24/7, even when idle
Takeaway:
Lambda has variable costs that scale with usage. SageMaker has fixed costs, making the tradeoff clear when requests grow.
The main question is:
When does SageMaker become the better option?
I’ve done the math.
SageMaker becomes a better option at ~72 requests per minute — take a look at the following graph:
Cost comparison of Lambda and Sagemaker
It is obvious that, with the serverless nature of Lambda, costs are going to be lower since you have a fixed price for running the SageMaker endpoint, but as you have more traffic, SageMaker will handle it cheaper.
You can notice that the green line, representing the SageMaker endpoint, starts going up as well, — that is expected, as you will have many Lambda invocations as well, however it’s manageable as the already mentioned Lambda proxy is configured to use the lowest configuration.
Here is a broader look at the cost of this benchmark, it shows a broader view of expected cost, based on the latest pricing and traffic you can expect.
| Traffic Volume |
Standard Lambda (4GB) |
SnapStart Lambda (4GB) |
SageMaker (ml.t2.medium + 128MB Caller) |
| Price per 1M Req (Variable) |
13ドル.80 |
16ドル.82 |
0ドル.81 |
| Fixed Monthly Cost |
0ドル.00 |
0ドル.00 |
40ドル.88 |
| Total: 10 RPM (~438k req/mo) |
6ドル.05 |
7ドル.37 |
41ドル.24 |
| Total: 50 RPM (~2.1M req/mo) |
30ドル.24 |
36ドル.87 |
42ドル.66 |
| Total: 72 RPM (~3.1M req/mo) |
43ドル.51 |
53ドル.05 |
43ドル.43 (Crossover point)
|
| Total: 200 RPM (~8.7M req/mo) |
120ドル.84 |
147ドル.32 |
47ドル.97 |
| Total: 1000 RPM (~43.8M req/mo) |
604ドル.22 |
736ドル.62 |
76ドル.38 |
References:
- SageMaker pricing - link
- Lambda pricing - link
CTO Verdict: A Decision Framework
Think in thresholds, not services.
Use Standard Lambda when:
- You’re in POC or early stage
- Traffic is low or unpredictable
- You want zero idle cost
Use Lambda with SnapStart when:
- Traffic is low and sporadic
- You are willing to pay for the SnapStart snapshot restoration
- You also want a zero idle cost
Use SageMaker when:
- You exceed the mentioned 72 requests/minute consistently
- Traffic is steady
- You want predictable cost
Final Rule:
- Lambda is the default
- SageMaker is the optimization