Debugging process with newer people
Problem
I find that once a team grows past 12 members it becomes very hard to have one developer assigned to an on-call position where they investigate and assign debugging issues. Usually, what ends up happening is if they can't figure out what's going on, they will then pull somebody else in who does. This works when people have enough understanding of the system, but what happens when you have newer people or people with less experience? They tend not to know the tools, they get lost very easily, and it is a painful process for them. Here are a few methods that your entire team can benefit from which aid in improving the quality of your software systems for identifying and surfacing debugging problems.
Actions taken
- To make the debugging process cleaner make sure that you have set clear guidelines around the prioritization of different errors. Define and distinguish which are minor from which are major. The functionality of this allows your team to recognize and proceed accordingly as presented in a well outlined course of action.
- If you are proceeding with an on-call individual, make sure that there is enough frequency of rotation so that people are going to remember and understand what is going on with the systems. Narrowing the scope of time will keep team members honorable and up-to-date.
- Better tools. Invest in as much tooling as you can. Make it as easy as possible for team members to debug your systems and to understand where the problem actually originated from.
- Tracing frameworks. The notion of tracing is based on a process where you can instrument multiple systems that talk to each other using identifiers. Obtaining those identifiers and the information that it provides lets you view latency more easily. There are available products and open source projects that allow you to easily do this including one called Honeycomb.io.
- A good log collection to a centralized area. This enables you to understand which hosts things are happening on without having to spend the extra time researching into it.
Lessons learned
- As systems get more complicated the more tools and disabilities that you can provide for your engineers, the easier the job gets for them and the easier it gets for them to take ownership.
- Set up a culture where shipping features and fixing bugs are of equal value. Explain their importance to you, to the company, and encourage team members to get the bugs under control.
- Allocate time, resources, and establish metrics around whichever plan you choose. This displays your seriousness of correcting the quality issues and holds the team accountable once procedures are put in place.
- It is very important that you are able to recover from bugs in your system quickly. Frequency of errors is less important than the time it takes to recover.
Be notified about next articles from Dmitriy Ryaboy
Connect and Learn with the Best Eng Leaders
We will send you a weekly newsletter with new mentors, circles, peer groups, content, webinars,bounties and free events.