Enterprise Cloud Cost Management - Part 3

In this final part of our series, we explore how to optimize costs, infrastructures, and billing by effectively combining two approaches: manual action by app teams (with in-depth operational visibility) and automatic resource cleanup (when no manual action is taken).

Categories: CloudAutomotiveCommunicationsConsumer and RetailFinancial ServicesHealthcareManufacturing and IndustrialMedia

In Part 3 of our blog series, we defined a comprehensive cost management framework  for enterprise clouds and took a deep dive into initial planning processes and operational visibility. In this final part of our series, we explore how to optimize costs, infrastructures, and billing by effectively combinin two approaches: manual action by app teams (with in-depth operational visibility) and automatic resource cleanup (when no manual action is taken).

Cost Optimization

Stakeholders should be given multiple opportunities to take action or register exceptions for their apps. If no stakeholder action is taken on dev/test environments, then recommendations can be automatically actioned. These two approaches, used in combination, will lead to better awareness and accountability.

Infrastructure Optimization

This section discusses various opportunities to optimize the cloud infrastructure landscape to fit the given utilization. This follows the cloud’s tenet on provisioning only what you need, and paying for only what you use.

Instance Rightsizing

Upsize or downsize the instances based on actual utilization trends, so that the peak average utilization hovers around the optimal (70%-80%) range.

Cleanup of Unused Resources

Remove any orphaned resources that are no longer being used. Some of these include:

  • Unattached disks (delete)
  • Orphaned snapshots (delete)
  • Unallocated IPs (release)
  • Unused Storage (recommend moving to Glacier/ColdLine)

Cleanup of Underutilized Resources

Identify and recommend clean up of resources that have been provisioned but are not being actively used. A common example is dev environments that were not deleted after testing. Metrics that can be used to identify these type of resources are:

  • Minimal or no CPU utilization
  • Minimal or no disk activity
  • Minimal or no IO activity

Instance Scheduling

Turn resources on and off based on when they are needed, rather than running them all the time. Considerations include:

  • Based on spikes in usages patterns
  • Instance scheduling for dev/test servers that don't need to be run 24/7

Instance Modernization

Cloud providers regularly release new versions of their instance families. These are based on the latest hardware and are often faster and cheaper than the older instance families. Modernizing instance families to the latest versions can optimize both performance and costs.

Cleanup of Other Cloud Services

For managed services provided by the cloud provider, metrics can be used to identify if services are being used, and released if they are not needed.

Billing Optimization

To optimize billing processes, (1) leverage reserved/committed use discounts in the production environment and (2) enable committed use and spot/pre-emptible instances in the dev/test environment. This allows users to fully utilize the usage discounts provided by cloud platforms. Some of these discount categories make sense for specific application environments. Details are below:

Production Environment

  • Start with 30% - 40% servers to achieve immediate cost savings before the app stabilizes in the cloud.
  • End with 100% servers after the app stabilizes in the cloud.

Dev/Test Environment

  • Start with 10% servers that need to run 24x7 (e.g., build servers).
  • End with 100% servers after the app stabilizes in the cloud.
  • Use spot/pre-emptible instances for environments that can be torn down and recreated.
  • Integrate the use of spot/pre-emptible instances with DevOps build processes.

Automation Approach & Opportunities

If automation tools are not available, then you should build them in-house. Start small and grow the automation catalogue. Remember, no single tool will solve all cost management problems — build and integrate tools as services. Below are areas in which you can apply automation, along with some tips on how to do it.

Tagging and Labeling

  • Report on tag non-conformance
  • Automatically add certain missing tags such as “created-by” (use to track creators of orphaned resources)
  • Create and maintain virtual tags for cloud services that don't yet support tags in the inventory management system

Reporting

  • Send daily reports directly to stakeholders on costs, projections, violations, and non-conformance

Resource Scheduling

  • Detect usage patterns and suggest server start/shut down schedules (to be used only during their usage periods)
  • Inform stakeholders and automatically implement scheduling for dev/test environments

Resource Cleanup

  • Automatically shut down instances/resources that don't have the required tags
  • Automatically shut down instances/resources that are not being used
  • Recommend and implement auto instance scheduling based on usage patterns
  • Remove unattached volumes and old snapshots (unless tagged)
  • Clean up other resources

Reservation Planning (committed use)

  • Track usage patterns and recommend instances for committed use
  • Track usage commitments and renew automatically (inform stakeholders of reservation expiry)
  • Track total savings and ROI for committed use discounts

Instance Modernization

  • Recommend instances that can be modernized to new instance types (i.e., cheaper and more efficient)

Spot/Pre-Emptible Instances

  • Track CPU load patterns for dev/test environments and recommend spot/pre-emptible instances

Tools Reference

The following table shows a representative list of tools that can be used for cost management at the various stages of cloud adoption. This is not an exhaustive list, as there are other tools in the market that fulfill niche requirements.

Concern

AWS Azure GCP Third Party/Custom
Initial Sizing AWS Cost calculator

AWS TCO Calculator

AWS Application discovery service

Azure Pricing Calculator

Azure TCO Calculator

GCP Pricing Calculator

GCP TCO Calculator

•GL’s custom tools

Cloudorado

Operational Visibility and Forecasting Trusted Advisor

AWS Organizations

Tags

Azure Advisor

Azure Cost Management

 

GCP Labels Cloudability

Cmpute.io

Cloudcheckr

Cloudyn

Botmetric

RightScale

Cost Optimization Trusted Advisor

AWS Lambda

Azure Automation

Azure Cost Management

Google Cloud Functions

Cloudability

Cmpute.io

Cloudcheckr

Cloudyn

Botmetric

RightScale

•Monitoring tools (Nagios, DataDog, Sensu)

Skeddly, Ylastic, etc.

Conclusion

We hope that this blog series has helped you start thinking about cost management holistically. The information given in this blog is not limited to any one cloud, either — these principles can be applied to all public clouds. With private clouds, some of these principles can be used to optimize resource densification, rather than the direct cost itself. If you would like more information about how GlobalLogic can help your business with cloud adoption, please email us at practice-cloud@globallogic.com.

  • URL copied!