Back to resources

Bouncing Back From Failure

Embracing Failures
Team Processes

19 August, 2021

Young Kim
Young Kim

Sr Manager, Software Engineering at LinkedIn

Young Kim, Senior Manager, Engineering at LinkedIn, recalls how failing to make the right decision turned into a lesson about the importance of resilience and failure in leadership.

Problem

Only a couple of weeks after I joined my previous company, we went through a big ramp, developing new features and racing toward the deadline. Our operational cadence was rather seasonal -- in fall, we would get a lot of traffic and would onboard a lot of new customers, but we knew that was coming.

Suddenly, we started to see some alarming signals. The traffic was going up, which was expected, but some critical aspects of website performance were quite disturbing. Things would slow down, timeouts would be frequent, and intermittent failures would be occurring out of the blue.

Actions taken

We rushed in to better understand what was happening because we were expecting additional growth and had to handle the stability inside first. We started to assess how much headroom we had, looking at and comparing past data. I was still rather new in the company and was leaning on the team to make recommendations.

Senior engineers did all the analysis, scrutinized data, and then concluded that we should be fine based on the headroom on our database. Nevertheless, we still needed to act. We had to push back on priorities and streamline our focus on infrastructure. We came up with a mitigation plan which we presented to the CEO. We felt confident, “Give us a week, and we will handle this.”

The next day, the database crashed. The website went down, and we spent the entire day firefighting. Furthermore, Murphy’s law proved itself once again; I was at the dentist and had to work remotely. Coordinating people on different teams remotely, from Sales to Customer Success, was tremendously difficult. Eventually, we got through that day.

We didn’t have much experience in post-mortems, but we had one of those uncomfortable conversations afterward. People were pointing fingers, others were taking the blame, and the overall tone of the conversation was rather stressful. We did the upgrade overnight, and the rest of the quarter went well. We accelerated some DevOps work, did ruthless prioritization, and went ahead with hiring.

But it was one of those moments. The failure came up as an inevitable consequence of taking things lightly, relying on other less competent people to make decisions, and being confident without grasping the potential severity of the situation. But it’s easy to play it smart in hindsight. At that moment, “we didn’t know what we didn’t know.” We looked at the data and misled some of the things. For me personally, it was a massive, visible, and critical failure. But I had to move past and forgive myself. That was a prerequisite to understanding what went wrong and how I could ensure that it won’t happen ever again.

Lessons learned

  • This experience taught me a great deal about the importance of failure and resilience in leadership. From a product angle, I learned not to over-index on building features without thinking about infrastructure and ensuring stability. You should be able to always strike a healthy balance between developing features and scaling infrastructure.
  • Some of the assumptions we made and on which we calculated how much headroom we had were false. We didn’t estimate data growth; when we did load testing, we based it on the present, not projecting six months into the future, which meant that the performance testing we did was not accurate.
  • This failure helped us introduce processes that would ensure that this kind of problem wouldn’t happen again. One should overcome the pointing-finger phase and feeling sorry for themselves to be able to build processes that would prevent a similar situation from happening.
  • At that time, we didn’t have a solid post-mortem process, so we had to introduce one. We had to detail how it should be run and how it should look like -- actionable and blameless with no one feeling uncomfortable about their past actions.
  • Our communication was not always clear. There was much confusion on who does this or that. Understanding that helped us mature as a team. We had to ask ourselves, How should we handle ourselves in those situations? What would be our reactions in situations of great emergency? We acknowledged the real possibility of such situations and created guardrails that would make our reactions efficient but calm and composed.

Discover Plato

Scale your coaching effort for your engineering and product teams
Develop yourself to become a stronger engineering / product leader


Related stories

Streamlining Product Processes After a Reorganization

16 May

Snehal Shaha, Lead Technical Program Manager at Momentive (fka SurveyMonkey), details her short-term technical strategy to unify processes among teams following an acquisition.

Acquisition / Integration
Product Team
Product
Building A Team
Leadership
Internal Communication
Collaboration
Reorganization
Strategy
Team Processes
Cross-Functional Collaboration
Snehal Shaha

Snehal Shaha

Senior EPM/TPM at Apple Inc.

The Optimization and Organization of Large Scale Demand

4 May

Kamal Qadri, Senior Manager at FICO, drives the importance of setting expectations when optimizing large-scale requirements.

Managing Expectations
Delegate
Team Processes
Prioritization
Kamal Qadri

Kamal Qadri

Head of Software Quality Assurance at FICO

Fostering Emotional Safety as a Leader

13 April

David Pearson, Sr. Engineering Manager at Square, recalls his experience of reassuring a first-time manager and highlights the importance of emotional support.

Company Culture
Internal Communication
Psychological Safety
Embracing Failures
David Pearson

David Pearson

Sr. Engineering Manager at Square

Why Documentation Is the Key to Success

6 April

Henning Muszynski, Head of Frontend at Doist, promotes his ideas on how documentation ensures consistency, efficiency, and standardization.

Alignment
Collaboration
Productivity
Hiring
Team Processes
Henning Muszynski

Henning Muszynski

Head of Frontend at Doist

It's Time to Say 'No' to Manual Business Processes

6 April

Henning Muszynski, Head of Frontend at Doist, talks about the cost of slow and arduous processes that add up over time and how to bring the changes systematically.

Changing A Company
Conflict Solving
Internal Communication
Feedback
Team Processes
Henning Muszynski

Henning Muszynski

Head of Frontend at Doist

You're a great engineer.
Become a great engineering leader.

Plato (platohq.com) is the world's biggest mentorship platform for engineering managers & product managers. We've curated a community of mentors who are the tech industry's best engineering & product leaders from companies like Facebook, Lyft, Slack, Airbnb, Gusto, and more.