Stop Using Lambda for ML at This Scale (Benchmark + Cost Analysis)

DEV Community

live_alias = _lambda.Alias( self, "SnapStartLiveAlias", alias_name="live", version=snapstart_lambda.current_version, )

SageMaker endpoint + Lambda proxy:

# Configure the SageMaker endpoint
endpoint_config = SageMaker.CfnEndpointConfig(
 self,
 "AudioPredictorEndpointConfig",
 production_variants=[
 SageMaker.CfnEndpointConfig.ProductionVariantProperty(
 variant_name="AllTraffic",
 model_name=model.attr_model_name,
 initial_instance_count=1,
 instance_type="ml.t2.medium",
 initial_variant_weight=1.0,
 )
 ],
)
# Define the endpoint
endpoint = SageMaker.CfnEndpoint(
 self,
 "AudioPredictorEndpoint",
 endpoint_config_name=endpoint_config.attr_endpoint_config_name,
)
# Initialize the SageMaker Lambda proxy
SageMaker_trigger = create_python_function(
 scope=self,
 function_name="SageMaker-predictor",
 handler="SageMaker_trigger_handler.handler",
 timeout=Duration.seconds(90),
 environment={
 "PREDICTIONS_TABLE": predictions_table.table_name,
 "PREDICTOR": "SageMaker",
 "MODEL_S3_URI": f"s3://{model_asset.s3_bucket_name}/{model_asset.s3_object_key}",
 "ENDPOINT_NAME": endpoint.attr_endpoint_name,
 },
)

Results

In the following image, you can see the execution duration of all the stacks which were used (note - SnapStart Lambda was ran once before to save the environment and then waited for 10 minutes for the Lambda to have a cold start again):

Latency Comparison

Method	Mean Latency	Median	Stability (Std)
Standard Lambda	280.72 ms	127.65 ms	443.29 ms
SnapStart	178.60 ms	124.69 ms	166.35 ms
SageMaker	339.18 ms	226.91 ms	350.76 ms

From the graph, we can see that the Lambdas execute faster than the SageMaker endpoint, staying under the 200ms mark. The circles represent the cold starts, and you can see that the SnapStart Lambda was at least 2x faster than other resources, thanks to SnapStart. SageMaker stack performed the worst, but not by a lot, having the most Lambda invocations just above the 200ms mark and the cold start taking almost 1.4 seconds.

Cost Breakdown

Lambda Cost (per request)

Formula: Cost = Duration ×ばつ Memory ×ばつ 0ドル.0000166667

Duration: ~200 ms
Memory: 4 GB

Cost per request: ~0ドル.0000133

Cost per 1M requests: ~13ドル.80

SageMaker Cost (fixed)

ml.t4g.medium ≈ 24ドル–30/month
Runs 24/7, even when idle

Takeaway:

Lambda has variable costs that scale with usage. SageMaker has fixed costs, making the tradeoff clear when requests grow.

The main question is:

When does SageMaker become the better option?

I’ve done the math.

SageMaker becomes a better option at ~72 requests per minute — take a look at the following graph:

Cost comparison of Lambda and Sagemaker

It is obvious that, with the serverless nature of Lambda, costs are going to be lower since you have a fixed price for running the SageMaker endpoint, but as you have more traffic, SageMaker will handle it cheaper.

You can notice that the green line, representing the SageMaker endpoint, starts going up as well, — that is expected, as you will have many Lambda invocations as well, however it’s manageable as the already mentioned Lambda proxy is configured to use the lowest configuration.

Here is a broader look at the cost of this benchmark, it shows a broader view of expected cost, based on the latest pricing and traffic you can expect.

Traffic Volume	Standard Lambda (4GB)	SnapStart Lambda (4GB)	SageMaker (ml.t2.medium + 128MB Caller)
Price per 1M Req (Variable)	13ドル.80	16ドル.82	0ドル.81
Fixed Monthly Cost	0ドル.00	0ドル.00	40ドル.88
Total: 10 RPM (~438k req/mo)	6ドル.05	7ドル.37	41ドル.24
Total: 50 RPM (~2.1M req/mo)	30ドル.24	36ドル.87	42ドル.66
Total: 72 RPM (~3.1M req/mo)	43ドル.51	53ドル.05	43ドル.43 (Crossover point)
Total: 200 RPM (~8.7M req/mo)	120ドル.84	147ドル.32	47ドル.97
Total: 1000 RPM (~43.8M req/mo)	604ドル.22	736ドル.62	76ドル.38

References:

SageMaker pricing - link
Lambda pricing - link

CTO Verdict: A Decision Framework

Think in thresholds, not services.

Use Standard Lambda when:

You’re in POC or early stage
Traffic is low or unpredictable
You want zero idle cost

Use Lambda with SnapStart when:

Traffic is low and sporadic
You are willing to pay for the SnapStart snapshot restoration
You also want a zero idle cost

Use SageMaker when:

You exceed the mentioned 72 requests/minute consistently
Traffic is steady
You want predictable cost

Final Rule:

Lambda is the default
SageMaker is the optimization