Contact Sales

Operating Strategies for Cost Optimization

Robert Fey

Nov 06, 2024 / 10 min read

Introduction

Cloud testing of embedded software is an essential part of the development cycle, but it comes with its own set of financial challenges. To achieve effective cost optimization in cloud environments, it’s crucial to adopt strategies that balance performance needs with financial efficiency.

This article explores practical approaches to cost management in cloud testing, with a focus on optimizing resources in Amazon EC2 environments, minimizing overhead times, and determining the ideal number of instances for parallel testing. From leveraging less powerful instances to balancing the use of Spot and On-Demand instances, we delve into how these tactics can help maintain high testing standards without breaking the budget. This is the first in a four-part series dedicated to maximizing efficiency and minimizing expenses in cloud-based testing for embedded software.

The Optimal EC2 Instance

Overhead times play a crucial role in cloud testing. They should be kept as short as possible to save costs. Therefore, all unnecessary redundant tasks and calculations, as well as communication bottlenecks, should be avoided.

Examples

  • Test frameworks are needed for all instances. Therefore, it makes sense to outsource the creation to one instance and copy the test framework to other instances that only start once the test framework is created.
  • Larger data sets can be copied to a cloud storage to avoid long communication paths between the computer and the instance.

Pro Tip

Architectural diagrams (component diagram, flow diagram) always provide a good overview. It should be clearly shown what exactly happens, and then time measurements should be taken for each process step. This provides a good overview of where there may be bottlenecks and what could be optimized.

Just like with physical machines, one might be tempted to adopt the mindset of "having more is better than needing it" when configuring cloud servers. However, in cloud testing, this can quickly and unnecessarily escalate costs. Because more power doesn ́t always equal faster performance.

In the end it is crucial to obtain results as quickly as possible. Now, parallelization comes into play. It means that if two less powerful instances deliver results just as fast as one powerful instance and the overall operation is more cost-effective, then the more economical option is recommended.

Pro Tip

Experiment with the available instance types to determine the necessary resources for test runs. With the data collected, it’s easy to calculate the optimum balance between performance and costs. One likely surprise will be how few resources are actually needed.

Theoretically, it’s always possible to start up more instances and thus reduce the duration of test execution. However, costs will continue to rise, and at an accelerating rate, the closer the two times (Overhead and Test Run) get.

Pro Tip

No further acceleration through additional instances when one of these limits is reached:

Limit 1 – "Reduction of Run Time": If the runtime of an instance is already in the order of the overhead timings (t_Runtime ~ 10* t_overhead), refrain from further improving the runtime with additional instances. Costs increase very quickly from this point onwards without a corresponding time gain. Limit 1: t_testrun_per_instance <= 10 * t_overhead

Limit 2 – "Practicality": Is it really necessary to have the test results even faster than within 1 hour? We recommend stopping the further instantiation of additional instances once t_Runtime ~ 1h has been reached.

Anticipating Instead of Reacting

The following scenario arises: There has been a change to a unit which should be verified by using a unit test. After that, a procedural integration test follows. Could one not run the integration test in parallel with the unit test to save time? This entails a cost risk. Why? If the unit test fails, the integration test becomes obsolete. Costs have been incurred for unusable results.

But there is the option to reuse the instance for the unit test, thus saving the overhead, i.e., unproductive times of an instance: Before the end of the test run, the necessary data (software, test data, etc.) are cleverly loaded into the instance, and the execution of the integration test is initiated. At the same time, the results of the unit test are downloaded.

Of course, the question arises: If I am already downloading the results, how do I know if the test run was successful? The solution is to continuously monitor the running instance, as most results are already known before the end of the test. The risk is then low that an integration test starts while the previous unit test has failed.

In the end, the additional cost risk is close to zero.

In careful consideration when the test results are truly needed, timeframes can be set.

Why use many instances to get test results within an hour when they are not needed at that time? For example: starting a test on Friday evening – at 6 PM – and the results are only needed on Monday.

In CI environments, test runs can be organized automatically. With the right strategy, planning, and a little patience, costs can be massively reduced, as seen in the use of long-term rentals.

Alternative Cost Models to On-Demand

Use of Long-Term Rentals

At many cloud providers, in addition to On-Demand, it’s possible to rent computing resources on a long-term basis. There are several payment models depending on the desired period of commitment.

Hybrid models are also possible: for example, by covering the base load with long-term rentals and using On-Demand for additional instances as needed. The cost savings compared to Spot Instances are of course much lower, but the setup is less complex.

Using Spot Instances

What are Spot Instances?

AWS offers Spot Instances as a solution for unused computing capacity, as data centers are rarely fully utilized, and they are offered with a discount of 80 percent or more.

However, there is a risk that these instances may be terminated in favor of paying customers. Depending on the EC2 type, the probability of this happening is around 20 percent.

What are Spot Instances?
How does the use of Spot Instances affect the overall costs?

For a better assessment, let’s compare the costs of Spot Instances with On-Demand Instances using the following assumptions:

Additional Demand Instances

A shutdown risk of approximately 20 percent increases the number of necessary instances for a test run.

Lower Costs

80 percent discount on Spot Instances compared to equivalent On-Demand Instances.

How does the use of Spot Instances affect the overall costs?
Calculation of Total Costs Spot vs. On-Demand

The total costs for using Spot Instances TC_Spot can be calculated using the following formula:

[画像:calc-spot-ondemand]
Calculation of Total Costs Spot vs. On-Demand
Calculation of the Percentage Increase in Instances Needed

If instances are subject to automatic shutdown, you won’t have test results and you’ll need to restart those instances.

With a shutdown risk of Risk_Shutdown > 0%, when testing with Spot Instances, you always have an additional demand for instances (PAI) compared to testing with On-Demand Instances.

This percentage of additional demand for instances (PAI) describes the ratio of the number of Spot Instances to the number of On-Demand Instances for a complete test run and can be sufficiently approximated with the following formula:

[画像:PAI calc]

Since the number n (necessary repetitions of failed Spot Instances) depends on the number of instances and the shutdown risk, we approximate the PAI with a mathematical trick. We calculate with an infinite number of repetitions. This allows us to determine the theoretically maximum PAI.

Calculation Example for ADI with Shutdown Risk = 20%

PAI = (20%)^0 + (20%)^1 + (20%)^2 + ... + (20%)^n
PAI = 100% + 20% + 4% + 0.8% + ... => 125%

More concrete? Now with specific numbers.
With 100 instances, you have to expect 20 instance shutdowns. These 20 instances are restarted. Of the 20 instances, 20 percent are shut down again, so 4 instances. These are restarted again. And so on.
In total, you need 125 instances.

Calculation of the Percentage Increase in Instances Needed
Calculation of the Percentage Costs for a Spot Instance

The number of necessary instances for a test run will increase. However, the cost of a Spot Instance is lower than that of an On-Demand Instance.

The percentage savings are:

[画像:PCS spot]

With an assumed discount of 80%, the percentage cost savings is 20%. In other words, operating a Spot Instance costs only one-fifth of the operating cost of an On-Demand Instance.

Calculation of the Percentage Costs for a Spot Instance
Conclusion: Total Costs Spot vs. On-Demand

In our example calculation, you only have 1/4 of the costs.

In other words: you save 75% of the costs. Spot Instances are worth it!

[画像:TC spot]

The cost savings when testing with Spot Instances also means that the duration of a test run is increased due to interruptions and restarts. How much longer it takes mainly depends on the initial number of instances. This can be calculated again using percentage calculations and exponential growth.

Conclusion: Total Costs Spot vs. On-Demand

Cost Calculator for Spot Instances

For simplified calculation, we have created a Spot Instances Calculator.

Please note that using Spot Instances comes with specific risks and benefits. It’s important to consider the individual requirements and resources of the application before implementing the strategy.

Duration and Iterations

The duration of using Spot Instances depends on the number of initial instances started. Additional iterations can arise from repeated restarts. The exact timing of instance terminations is uncertain, whether it happens right at the beginning or only towards the end. Continuous monitoring and automated restarting are very beneficial in this scenario.

Combining Spot and On-Demand

This model can be cost-effective even if the exact duration is uncertain. If you don’t need the test results immediately, a recommended strategy is to combine Spot and On-Demand instances.

To do this, you need to know when valid results should be available, for example, Monday at 8:00 AM. With this time in mind, you can calculate backwards to determine when multiple parallel On-Demand instances need to be started to obtain all test results. The time from test start to this latest possible deadline can be used for execution with Spot Instances. Instances that have already run in Spot mode do not need to be counted with On-Demand instances. The setup for planning and monitoring is a bit more complex, but in this hybrid model, you have the necessary security and cost efficiency for your test execution.

Considerations for Using Spot Instances

Payment models may change over time


It is advisable to keep track of changes in the payment terms.

Shutdown rates are dependent


The rate of instance shutdowns can vary depending on the country and time.

Optimize instance duration


The likelihood of termination increases with the duration of the instance. It is advisable to adjust the cost-benefit optimum accordingly.

Automation requires expertise


Implementing automation requires a deep understanding to avoid costly mistakes.

Controlling the Execution

It may happen that test runs contain misconfigurations, so it’s important to monitor instances. If the execution takes unusually long, it should be manually stopped or, in the case of CI applications, automated. Then, secure the results, shut down the instances, and inform the operations and service team, for example, through an atomized abort email.

Monitor the test execution from the beginning and verify the result after each test case. If many tests fail, the execution should be aborted. The results should be saved, and all instances should be shut down.

Pro Tip

Introduce a sensible threshold value at which test runs should be aborted. If a large number of tests fail, in most cases there is something wrong with the test execution or there is a major bug in the code. In most cases, not all tests are needed for the analysis. ​

Adjusting Test Strategy & Methods

Testing the Model or the Code?

A Simulink model is to be tested? First in MiL and then in SiL. At first glance, this sounds like a good idea. But why should the model be tested once and then the generated code?

Our recommendation: do it without the MiL run and test only the code.

This saves the execution of the model and the costs for licenses and still has meaningful results.

Testing the Model or the Code?
Customized Test Strategies

Free Consulting for Your Product

There is no such thing as a single, optimal test strategy. It is too strongly linked to the product, the requirements and the goals. We would be happy to help you develop a suitable strategy in an individual, free strategy discussion.

More About Testing Services
Return to Part 1: Why Testing Software in the Cloud? for an overview of the series.

Continue Reading

AltStyle によって変換されたページ (->オリジナル) /