Instances

  • The purpose of instance recommendation is to show the savings on instances  with different optimizations like rightsizing, startup/shutdown and idle instances and how these are calculated and generated.

  • Rightsizing recommendations

    • When making instance type recommendations, vCPU/memory usage and capacity are considered and checked against thresholds (in Settings page) to determine if the current instance type should be replaced and by which ones to reduce costs.

      • Hourly peak data is analyzed from the last 90 days for vCPU utilization which calculates the optimal capacity for the current configuration. Similarly, we find the optimal memory capacity based on the current instance’s usage trend. 

      • Next, combining both of these optimal configurations with a sampling factor threshold we recommend the best fit instance types for your current configuration would reduce your overall infrastructure cost by replacing your current instance type with the best fit ones.

      • Finally, using a sampling threshold (default at 10%), we check to see if 10% of hourly peak samples utilization (for EITHER memory or CPU) exceed the maximum threshold and whether changing instance types will mitigate those peaks cost effectively (this is only for downsizing, and upsizing is the opposite).

        • E.g instead of running workloads quickly with more CPUs, is it still possible to reduce the number of CPUs (by switching to a new instance type) and reduce workload times, but maintain performance and reduce costs overall is a question that is measured.

        • The inverse is true when your current usage exceeds its capacity on a consistent basis, at which point upsizing your instance type is recommended to stabilize workloads cost effectively.

        • 10% is a good threshold to sample from due to a variety of factors like backups and network activity, but this can be changed at the user’s preference. 

        • To see and modify all thresholds that are checked against the utilization data for recommendations, navigate to the settings page into the Rightsizing tab.

      • With each recommended instance type comes a risk level, which is to show a comparison between the existing and suggested type’s CPU and memory configuration. A risk level of 1 is given if the existing CPU and memory configuration matches or is less than the suggested type’s CPU and memory configuration. A risk level of 3 is given when the suggested type CPU or memory is half or less of the existing type’s current configuration. Everything else is given a 2.

    • The calculation for savings on rightsizing is the (suggested instance rate (hourly) - existing type rate (hourly) ) * 24) * 365. The numbers displayed in the UI, might not show the same calculation as the UI only shows up to 2 decimal places, but calculate past them.

  • Startup/Shutdown recommendations

    • When making startup/shutdown recommendations, the goal is to provide users with a startup and shutdown schedule for their instances that can reduce costs for underutilized resources at specific times during the week. 

      • The purpose of this is to offer a way to autonomously start and stop instances when they are operating under (or over) a specific CPU and memory usage threshold (these thresholds can be found in the Settings page as well).

      • Schedule recommendations are made based on the average CPU/memory utilization over the last 30 days for each hour of each day.

        • E.g. if an instance has an average of 45% CPU/memory utilization (where the threshold was 20%) on Monday at 3:00AM over the last 30 days, we would recommend that it stays on. Similarly if the instance was running at an average of 5%  CPU/memory over the last 30 days (where the threshold was 10%)  on Tuesday at 2:00PM, we would recommend it stay off for that particular HOUR.

      • A sample factor is also used to foolproof this in case of anomalous utilization. For example, if we have 100 samples of CPU/memory utilization over a given month and more than 10% (our sample factor) of samples operates over their CPU/memory thresholds (but the rest of samples are under), we would recommend it to stay on. However, if less than or equal to 10% of samples are over the threshold (and the rest are under), we would recommend it to stay off.

    • The calculation for savings on startup/shutdown is the ((mtd spend / number of active instance hours) / 24) * # of hours that vm is off) * 365. The numbers displayed in the UI, might not show the same calculation as the UI only shows up to 2 decimal places, but calculate past them.

  • Idle Instance recommendations

    • When making recommendations for idle instances, the approach is similar to startup/shutdown, and instances that are below or at the minimum CPU/memory threshold are considered idle.

      • A sample factor is also used to measure any anomalous utilization. We use both a minimum and maximum to generate a recommendation for this. 

      • To be considered for recommendation, we need to first check if the instance meets the requirements to do so. E.g if we have a CPU utilization threshold minimum of 10% and a maximum of 20%, with a sample factor of 5%, then more than 5% of samples should have a minimum utilization of 10% or less to be considered for recommendation.

      • However, if more than 5% of samples are above the maximum 10% threshold, then we will not classify it as idle.

    • The calculation for savings on idle instances is the (current mtd spend / current day of month) * 365. The numbers displayed in the UI, might not show the same calculation as the UI only shows up to 2 decimal places, but calculate past them.