How to Prevent a Single Point of Failure
28 July, 2020
The prevalent opinion today is that everyone should be constantly on the lookout for career opportunities. This is a huge problem in the Bay Area where people are always considering changing their careers -- a favorite environment for a single point of failure problem to bloom.
A single point of failure is a dependency on one person who over the years accumulated significant institutional and technical knowledge. When that person decides to quit or announce their short- or long-term absence -- for whatever reason -- we should be prepared to carry on without that person.
Unfortunately, we often end up unprepared. Throughout my career, I’ve seen this happen on multiple occasions and I came up with guardrails for preventing a single point of failure problem.
Unfortunately, there is no silver bullet that would solve this problem, but some processes could help prevent it.
Engineers often think of documentation as something that should be left to those reading the code later in time. But documentation is so much more; it tells a story about what kind of work someone is doing and how it is being done. For example, someone was with the company for a very long period of time and had extensive knowledge about the complete database schema. If this person leaves without creating a diagram or a blueprint of a database system that would explain how things interact and work together and who is the responsible person for what, the code itself wouldn’t be of much help.
This particularly applies to startups that in their rapid growth neglect documentation and rely on the institutional knowledge resting with one person. Reasons for that could be manifold. When I was working in one startup I encountered a peculiar situation -- the lack of documentation was not a result of their inability to enforce documentation but because one person ‘hijacked’ the institutional knowledge. Holding all that knowledge for himself became the main reason why he couldn’t be fired and he didn’t have the leverage to share it because that would make him redundant.
Every once in a while, I would swap the ownership within the team. If there would be a person who entirely handled the front-end and someone else back-end, I would swap them after a certain period of time. Ideally, everyone on the team should be familiar with what other people were doing. Though people were coming on the team with a diverse skill set and expertise, I tend to make the team full-stack and have them mingle expanding their knowledge (Don’t know is a good thing, it is an opportunity for you to learn!). By doing so they would work together and collaborate more and a precondition for that would be that they understand what each of them was doing. For example, people on my team would have either statistics or computer science background but I would pair them up on projects so that they could understand what each of them would be doing. In general, I practice a more holistic approach to knowledge; while everyone is responsible for delivering one part of a project, no one owns and holds that part forever.
People easily get bored doing one same thing over and over again. Collaborative work allows them to constantly learn new things and expand the horizons of their curiosity. Also, their formal progression as reflected in climbing up career ladders impacts the amount of knowledge they accumulate. Knowledge held by a senior engineer burgeons to include many things a junior person is yet to learn.
But one more aspect of motivation affects the singular point of failure problem. This problem emerges when people are leaving in short intervals, in groups, etc. Making your team happy and motivated prevents churn and knowledge stays in-house.
Make these three steps part of your plan of action and enforce them into the team from day one.
Whenever you encounter the situation when one person becomes a single point of failure, try to mitigate the risk by extracting as much information as possible. I was approached by a person who was with one startup for ten years, never documented a single thing, and had extensive database knowledge. He complained that he was not able to focus on his work as he was distracted by recurring requests to provide assistance around the database. Since I realized that he had an attitude problem, instead of asking him straightforwardly to deliver the scheme or blueprint of the database, I asked him simple questions to fill the gap and then re-engineer most of it without his input. Then, I could ask him about the specific steps he took (why did you go with this database design, for example). Based on his answers I put out a blueprint without him even understanding why I asked him those questions.
However, to create a system that will prevent this problem is always more commendable than fixing it when things go wrong. So, be cognizant of the problem and make sure that the working process is structured in a way that will not make it possible to happen.
Peter Fedorocko, Director of Engineering at Workday, discusses if a manager should keep his skip-level one-on-ones and describes how he introduced the Open Doors instead.
Director of Engineering at Workday
Lloyd Holman, Head of Engineering at By Miles, explains why documentation is essential for any company to achieve excellence, particularly underlining its importance in onboarding new engineers.
Head Of Engineering at By Miles
Arun Krishnaswamy, Director of Data Science at Workday, elaborates on how he approached a single point of failure problem while sharing three key tips (or guardrails) on how to prevent it.
Director at Workday
Venkat Venkataraman, Sr. Director of Engineering @ Gracenote, shares a story about learning from what happens during a company wide initiative that has tons of potential.
Sr. Director of Engineering at Gracenote
Shridharan Muthu, VP of Engineering at Zoosk, discusses effective communication using Slack including a recommended framework that entails three simple tips to make the most of the tool.
VP of Engineering, Backend Applications at Zoosk
You're a great engineer.
Become a great engineering leader.
Plato (platohq.com) is the world's biggest mentorship platform for engineering managers & product managers. We've curated a community of mentors who are the tech industry's best engineering & product leaders from companies like Facebook, Lyft, Slack, Airbnb, Gusto, and more.