Back to resources

Ways to Reduce Your Cloud Costs

Impact
Productivity

26 July, 2020

Andrew First
Andrew First

Co-founder & CTO at Plannery

Andrew First, Co-Founder and Chief Technologist at Leanplum, shares how with a focused effort his company succeeded in reducing cloud costs by more than 60 percent in only six months.

Problem

Our cloud computing costs were increasing dramatically over time. We were spending too much money on our cloud costs and the only data we had was the invoice from our provider that was constantly going up.

Actions taken

First and foremost, we had to understand what exactly was the problem, but we didn’t have detailed enough data and visibility into our computing costs.

I started by streaming our billing data into BigQuery and creating a dashboard in Data Studio to create visibility into our costs within the organization. This allowed us to see how much we were spending in different areas and with different computing products, and how this was changing over time. The dashboard was updated automatically. Every morning, I would look at it and post screenshots in Slack highlighting problems or insights, for example, the cost went up today, why do you think that happened?

I collaborated with others to develop a cost reduction roadmap that would make cost a lasting pillar within our priorities. Within that pillar, we created a prioritized backlog. For each project, we estimated how much we could save. Some projects also had strategic impact around improving performance and reducing technical debt, so we included those factors when prioritizing. Having this backlog in place helped facilitate brainstorming, and ideas on how to further reduce our costs started to pour in.

Within the backlog, there were several small, impactful projects that could be done across the organization. To focus our effort, we developed a tiger team of just a few engineers to focus on delivering these projects without having to disrupt other teams. By focusing only on cost, they also developed tools and best practices that were distributed to the rest of the organization, such as profiling, canarying release process, and performance analysis tools that compare two versions of the service to establish if there was a significant difference in reduction in latency and in costs.

The cost initiative was also able to influence other departments across the company besides engineering. We had hundreds of customers using our platform, built with a multitenant architecture, and it was not easy to identify the amount of resources each customer was using. We created a predictive heuristic model, based off of our application logs, to estimate the cost per customer. We turned this into a dashboard, which created visibility within the company on the profitability by customer. That led us to redo our pricing model, in collaboration with Product Marketing, Sales, and Finance, by devising a single metric that best correlated with our costs. The Sales and Customer Success teams applied this model to all new customers and for customers approaching renewal, in order to rectify or eliminate unprofitable customers.

Lessons learned

  • We had trouble taking action until we thoroughly analyzed and broke down the problem. Unlike leaders who have a broader perspective on problems, most engineers think at the level of the scope they are involved in and struggle to identify problems transcending that scope. Once we came to the level of labeling each service and each resource by team -- engineers on those teams could focus on and comprehend the problems specific to their team, and consequently, define follow-up tasks. If you can't break down a metric by the appropriate level, it's hard to take any action.
  • Lead by example! If you want the team to focus on a certain goal, look at the metrics you care about every day. I would pull out the cost dashboard every morning. You will most certainly find persuasive data, and your commitment will signal to everyone the importance of the problem. If it's important enough for you to spend your time on it, it's important for everyone else. That can also stir productive conversations and bring to a number of great ideas. Posting some of those ideas will further draw attention and increase overall visibility.
  • There is a correlation between cost and reliability. We observed that many reliability incidents had caused a cascading chain of events. For example, a customer’s traffic pattern would cause Datastore to be hit too hard, request latency would increase, more instances would start up, and the surge in instances would add additional pressure to preload caches hammering Datastore even more. A single incident like this could cost us thousands of dollars! Therefore, increased reliability likely means decreased costs.
  • There is also a correlation between cost and performance. Our segmentation system was very costly due to its very inefficient design, which would copy the user profile rather than sending delta updates. It also was not scaling for our biggest customer. We re-architected this system, which became a win-win situation -- reduced costs for us and better service for our customers.
  • Cost reduction requires an ongoing effort to maintain costs. Earlier this year, we reduced our costs by over 60% in six months, and many team members mistakenly thought we were done with it. A couple of months later, our costs started to go up again. You have to constantly keep an eye on costs. The problem is really entropic in its essence -- it’s easy for an engineer to introduce a change that can increase costs, or a new customer gets onboarded with a unique use case that ends up being costly, and things can spiral from there.

Discover Plato

Scale your coaching effort for your engineering and product teams
Develop yourself to become a stronger engineering / product leader


Related stories

From Big Tech to Startup: Adding Value From Day 1

19 January

Angel Jamie, Chief Product Officer at Yayzy, recalls his transition from a well-established tech company to a sustainability startup, and the major differences he experienced.

Dev Processes
Company Culture
Impact
Team Processes
Cross-Functional Collaboration
Changing Company
Career Path
Performance
Angel Jaime

Angel Jaime

CPO at yayzy

How to Spark Sales-Driven Change

19 January

Nani Nitinavakorn, Sr Product Owner at Revolut, recalls her experience initiating a structural change to optimize her entire company.

Customers
Innovation / Experiment
Leadership
Meetings
Impact
Users
Nani Nitinavakorn

Nani Nitinavakorn

Sr Product Owner at Revolut

Analyzing a Problem for Real Causes and Coming to a Pragmatic Solution

7 January

Ranadheer Velamuri, Director of Engineering at Tesco, shares how he increased productivity by analyzing his problem and determining the best solution.

Alignment
Conflict Solving
Internal Communication
Productivity
Prioritization
Ranadheer Velamuri

Ranadheer Velamuri

Director of Engineering at Tesco

Decreasing Distractions During the Remote Workday

7 January

Ross Bruniges, Engineering Manager at Atlassian, shares his tips for a successful work-life balance, creating boundaries to decrease social distractions.

Salary / Work Conditions
Personal Growth
Productivity
Health / Stress / Burn-Out
Performance
Ross Bruniges

Ross Bruniges

Engineering Manager at Atlassian

Using Documentation to Increase Efficiency in the Remote Workplace

7 January

Kiran Bondalapati, VP Engineering at Snorkel A, describes his transition into the remote working environment at his previous startup and the challenges he overcame.

Remote
Meetings
Collaboration
Productivity
Feedback
Onboarding
Kiran Bondalapati

Kiran Bondalapati

VP Engineering at Snorkel AI

You're a great engineer.
Become a great engineering leader.

Plato (platohq.com) is the world's biggest mentorship platform for engineering managers & product managers. We've curated a community of mentors who are the tech industry's best engineering & product leaders from companies like Facebook, Lyft, Slack, Airbnb, Gusto, and more.