Plato Elevate Winter Summit has been announced (Dec 7th-8th)

🔥

Back to resources

Bouncing Back From Failure

Embracing Failures
Team Processes

19 August, 2021

Young Kim
Young Kim

Sr Manager, Software Engineering at LinkedIn

Young Kim, Senior Manager, Engineering at LinkedIn, recalls how failing to make the right decision turned into a lesson about the importance of resilience and failure in leadership.

Problem

Only a couple of weeks after I joined my previous company, we went through a big ramp, developing new features and racing toward the deadline. Our operational cadence was rather seasonal -- in fall, we would get a lot of traffic and would onboard a lot of new customers, but we knew that was coming.

Suddenly, we started to see some alarming signals. The traffic was going up, which was expected, but some critical aspects of website performance were quite disturbing. Things would slow down, timeouts would be frequent, and intermittent failures would be occurring out of the blue.

Actions taken

We rushed in to better understand what was happening because we were expecting additional growth and had to handle the stability inside first. We started to assess how much headroom we had, looking at and comparing past data. I was still rather new in the company and was leaning on the team to make recommendations.

Senior engineers did all the analysis, scrutinized data, and then concluded that we should be fine based on the headroom on our database. Nevertheless, we still needed to act. We had to push back on priorities and streamline our focus on infrastructure. We came up with a mitigation plan which we presented to the CEO. We felt confident, “Give us a week, and we will handle this.”

The next day, the database crashed. The website went down, and we spent the entire day firefighting. Furthermore, Murphy’s law proved itself once again; I was at the dentist and had to work remotely. Coordinating people on different teams remotely, from Sales to Customer Success, was tremendously difficult. Eventually, we got through that day.

We didn’t have much experience in post-mortems, but we had one of those uncomfortable conversations afterward. People were pointing fingers, others were taking the blame, and the overall tone of the conversation was rather stressful. We did the upgrade overnight, and the rest of the quarter went well. We accelerated some DevOps work, did ruthless prioritization, and went ahead with hiring.

But it was one of those moments. The failure came up as an inevitable consequence of taking things lightly, relying on other less competent people to make decisions, and being confident without grasping the potential severity of the situation. But it’s easy to play it smart in hindsight. At that moment, “we didn’t know what we didn’t know.” We looked at the data and misled some of the things. For me personally, it was a massive, visible, and critical failure. But I had to move past and forgive myself. That was a prerequisite to understanding what went wrong and how I could ensure that it won’t happen ever again.

Lessons learned

  • This experience taught me a great deal about the importance of failure and resilience in leadership. From a product angle, I learned not to over-index on building features without thinking about infrastructure and ensuring stability. You should be able to always strike a healthy balance between developing features and scaling infrastructure.
  • Some of the assumptions we made and on which we calculated how much headroom we had were false. We didn’t estimate data growth; when we did load testing, we based it on the present, not projecting six months into the future, which meant that the performance testing we did was not accurate.
  • This failure helped us introduce processes that would ensure that this kind of problem wouldn’t happen again. One should overcome the pointing-finger phase and feeling sorry for themselves to be able to build processes that would prevent a similar situation from happening.
  • At that time, we didn’t have a solid post-mortem process, so we had to introduce one. We had to detail how it should be run and how it should look like -- actionable and blameless with no one feeling uncomfortable about their past actions.
  • Our communication was not always clear. There was much confusion on who does this or that. Understanding that helped us mature as a team. We had to ask ourselves, How should we handle ourselves in those situations? What would be our reactions in situations of great emergency? We acknowledged the real possibility of such situations and created guardrails that would make our reactions efficient but calm and composed.

Discover Plato

Scale your coaching effort for your engineering and product teams
Develop yourself to become a stronger engineering / product leader


Related stories

Building a New Team in a Foreign Country

23 November

Adam Hawkins, Site Reliability Engineer at Skillshare, went all the way across the world to build a brand new team who worked very differently than he was used to.

Team Processes
Adam Hawkins

Adam Hawkins

Site Reliability Engineer at Skillshare

How to Pivot a Product Idea at the Right Time

23 November

Adi Purwanto Sujarwadi, VP of Product at Evermos, shares how he diligently managed a product in one of the biggest eCommerce companies by being an individual contributor.

Innovation / Experiment
Product Team
Product
Embracing Failures
Adi Purwanto Sujarwadi

Adi Purwanto Sujarwadi

VP of Product at Evermos

What It Takes to Understand Other’s Perspective

23 November

Nicholas Cheever, Divisional Vice President, Global Supply Chain Technology at Trimble Transportation, shares how to really understand someone else’s point of view.

Team Processes
Nicholas Cheever

Nicholas Cheever

Divisional Vice President, Global Supply Chain Technology at Trimble Transportation

How to Handle Team Collaboration After a Merger?

23 November

Nicholas Cheever, Divisional Vice President, Global Supply Chain Technology at Trimble Transportation, shares how he helped the acquired company’s team members understand the business mission and give them focus.

Acquisition / Integration
Team Processes
Nicholas Cheever

Nicholas Cheever

Divisional Vice President, Global Supply Chain Technology at Trimble Transportation

Surefire Ways to Boost Team Morale

11 November

Rajesh Agarwal, VP & Head of Engineering at Syncro, talks about effective ways to boost team morale when stepping in as a new manager in the team.

Motivation
Team Processes
Rajesh Agarwal

Rajesh Agarwal

VP and Head of Engineering at Syncro

You're a great engineer.
Become a great engineering leader.

Plato (platohq.com) is the world's biggest mentorship platform for engineering managers & product managers. We've curated a community of mentors who are the tech industry's best engineering & product leaders from companies like Facebook, Lyft, Slack, Airbnb, Gusto, and more.