Shipping Cloud-Native Software With a Focus on Sustainable Operability

Nikhil Mungel

Engineering Manager at Cribl



I have run multiple service teams across different parts of the company for a few years now and it always comes down to the cost of operating services. You very quickly realize that it’s not only about developing the features of the software but also about operating it sustainably for years to come. Teams can occasionally lean towards selecting for near-term benefits versus long-term sustainable operability. Naturally, when teams grow to own multiple services, they end up being very different from one another as a part of the growing company.

Services of different vintages can have very different technologies and frameworks that support the functionality. The differences in the coding frameworks, testing frameworks, ops dashboard, and troubleshooting tools add up over time. As a result, scaling problems become very evident. Teams also had to reimplement some features across the multiple services, and there were numerous code silos across them. What happened was there were pockets of code that only a few engineers were experts in, and nobody else knew how that really worked.

Actions taken

The first solution for me was to double down and invest in nonfunctional engineering requirements that power sustainable operability. I started creating services as a top-level line item that every team should be responsible for. The teams were no longer accountable for developing and shipping features the way product and engineering managers had requested. They were also responsible for operating their infrastructure on their own services.

I encouraged my team to hold strong opinions on their tech stacks and coach an approach that favors operating, scaling, and being accountable for services in addition to exciting shipping features. I knew that everyone loves to ship exciting features, but that is not everything we had to do as engineers. Hence, if my team felt strongly that a different tool or technology would work, they were free to choose it with logical reasoning.

Lastly, I fostered a culture of sustainability operating back-end services that pays rich dividends. There were a bunch of cool technologies hiding behind our products, but we wanted to innovate and shape the future. We liked everything to be processed in a sustainable way.

Lessons learned

  • Be thoughtful about how peripheral systems are designed and architected. Not all of them would deliver primary value to users, so you need to be mindful of that.
  • Hire and recognize talents that treat build systems, testing frameworks, SRE best practices, debugging, and troubleshooting tools, all with the same enthusiasm. Everyone is excited about shipping user-facing features, but the others are also important.
  • Grow the collective strength of the team in the direction that benefits them over the longer cycle.
  • Your organization’s culture decides whether stakeholder management and positioning non-functional engineering work are going to be easy or challenging. It would be best if you were flexible around this.

Be notified about next articles from Nikhil Mungel

Nikhil Mungel

Engineering Manager at Cribl

Engineering LeadershipLeadership DevelopmentCommunicationOrganizational StrategyDecision MakingCulture DevelopmentEngineering ManagementPerformance MetricsLeadership Training

Connect and Learn with the Best Eng Leaders

We will send you a weekly newsletter with new mentors, circles, peer groups, content, webinars,bounties and free events.


HomeCircles1-on-1 MentorshipBountiesBecome a mentor

© 2024 Plato. All rights reserved

LoginSign up