Introducing Rightsizing Recommendations Purpose-Built for the Cloud

Cloudability is proud to introduce updates to our Rightsizing Recommendation Engine. Join us and Atlassian as we walk through how to best approach the challenge of improving cloud efficiency and elasticity.

One of the primary reasons organizations migrate their workloads to the cloud is to take advantage of efficiency gains that are available with a well-factored deployment. Picking the best AWS service for your workload, matching well to a specific resource type and building in elasticity so that you can move and shift as your world changes, is a great way to save costs while balancing the needs of your customers.

However, attempting to save on costs by simply turning off a bunch of seemingly idle resources or performing poorly informed rightsizing could lead to disastrous outcomes. So, how do you drive effective rightsizing and elasticity?

Risk Versus Reward — A New Way to Look at Rightsizing

Rightsizing is the process of matching cloud workloads to underlying infrastructure in a way that minimizes waste. Workloads are typically launched with a set of performance assumptions; by looking at the data as it runs, you can see if those assumptions were valid.

The beauty of cloud is that if you’re consistently running above or below your expectations, you can quickly make a change to address it. For example, perhaps you provisioned a large virtual machine for a particular job but it turns out only a certain fraction of that machine’s resources are required. When taking a corrective action, you need to understand both the risk of the change and the reward.

We’re launching a new way to look at rightsizing with analytics that provide a number of actionable options for each resource, with a clear risk profile attached to each option. Rather than a service that gives you a singular blind recommendation, we offer transparency across multiple options that you can validate before taking the next step.

For the fundamentals on cloud resource rightsizing, check out our free e-book.

Partners in Cloud Optimization

For a number of years, one of the clear leaders in this space has been the team at Atlassian and we are very grateful to work with them on improving how delivery teams do rightsizing. Cloud mastery is hugely significant to Atlassian’s business operations, but as Mike Fuller, Principal Systems Engineer on the Cloud Engineering team, says: “As important as cost savings are within Atlassian’s AWS infrastructure, we will always prioritize site availability and the happiness of our customers first.”

Let’s look at how this can be achieved.

A Consideration of a Mature Cloud Organization: Opportunity Cost

Another consideration for the effort of rightsizing is opportunity cost. Our customers tell us their most expensive resources are people (likely your biggest cost too!), and dedicating their time to cost optimization is weighted against other business deliverables. This is why organizations need a tool that does as much of the heavy lifting as possible, is highly tuned in ranking your savings opportunities and — most of all — is based on rigorous scenario testing backed by big data smarts. In short, you want your engineers focused on the $7,000 problem, not 1,000 $7 problems.

Evaluate Risk & Understand Your Workload

The first step toward grading risk is fully understanding your current workload — i.e., having a rich set of historical data points covering peaks and shifts over the time series. At Cloudability, we take a weighted aggregate into account, but decisions made purely on averaging out workloads will inevitably lead to performance issues. The most basic example of this is shifting your resource to an instance type/size where the average workload holds up fine, but you end up ‘clipping’ as soon as there is a typical peak. And when we are talking web apps, or anything that faces consumer traffic, we know that 1) workloads do tend to have real variation and 2) if performance/availability were to be compromised, the consequences to the business could be profound.

Understanding Clipping

Cldy emerge 2017 08 rightsizing q3 17 1
Figure 1: Example of clipping, when a user chooses an instance type where the average utilization (37%) appears fine, but the instance is maxed out during peak periods. At these moments of clipping, your application’s performance is likely to degrade and could lead to server errors.

Using Cloudability’s Recommendation Engine

As we stated earlier, at a high level there are two key things at play here: risk and savings. Generally these work to the inverse of each other — i.e., with a higher acceptance of risk, there are more options for significant savings. Calculating potential savings is fairly straightforward (even if there’s some decent number crunching); it’s being able to zone in on valid resource types and find a way to represent/mitigate risk where the rubber hits the road. What Cloudability is delivering is game-changing in a number of ways:

  • Taking into account the entire time series for your resources, we create a statistical model around average and peak utilization periods.
  • Having a massive data pool to work with, we have confidence in the performance characteristics across instance families/types (even between regions). We use this information to zone in on valid instance types for your workload.
  • Using proprietary algorithms to rank risk between these different options. This includes evaluating the likelihood of clipping and how much headroom you are likely to have.
  • Giving you, most importantly, a graphical representation to assess this risk by enabling you to overlay your workload on top of the most applicable instance types in your region. Here we are talking about representing the entire time series, and being able to visually inspect your headroom and potential for clipping. This is across all four key metrics of CPU utilization, disk (for instance types with local disk), memory and bandwidth. Being able to cycle through these recommendations and simulate resource utilization ahead of actioning is game changing for rightsizing initiatives.
Cldy emerge 2017 08 rightsizing q3 17 2
Figure 2: Visualizing your workload against another instance type. The dashed yellow line represents the capacity of the target resource (selected recommendation), whereas the red line is the source resource.

Recommendations Between Instance Families

One extra capability especially worth noting is that the algorithms are sophisticated in recommending across families, whereas other tools out there will only recommend between sizes. This is crucial when you consider how often we hear of cases where the “shape” of the instance doesn’t match the actual workload (for example, memory may be drastically over-provisioned, but CPU is maxing out). Being able to move between families opens up so much opportunity for extra savings and will help you land on an instance type which most closely matches your workload profile.

Need a refresher on EC2 instance families? Our free comprehensive EC2 instance family guide can help.

Pro-Tip: Don’t Just Focus on EC2 Instances

Cloudability is one of the only tools out there that operates across multiple key AWS services, including RDS and Redshift. It’s easy to get so focused on EC2 instances that we miss large potential savings elsewhere. Whether you are using Cloudability or another tool, make sure you have the right insights at hand or you’ll be leaving a pile of cash on the table.

Cldy emerge 2017 08 rightsizing q3 17 3

Before You Get Started: A Word from Atlassian

“At Atlassian, we need to make rightsizing accurate, simple and easy to follow to ensure we utilize our staff’s time most effectively. Cloudability’s new rightsizing feature is super powerful as it focuses on ensuring the recommendations will be impact free and the savings are worth our effort,” said Mike Fuller. “Combined with Cloudability Views system, we are able to get the most important info into the right hands.”

To see this new feature in action, get in touch with your Account Team if you’re a customer, or start a Free Trial.

Article Contents



Additional Resources