Implementing a Rotation for Operations Toil
2 April, 2020
About a year ago my fairly small team suddenly grew to five people. With this growth came a mix of individuals. We had junior level, senior level, and some very experienced people. More so, we had people who were excited about the job, happy to be in the engineering field, and one person who was previously a manager but decided to convert back to the role of an IC.
The team as a whole had two different types of tasks, one that was interesting to everybody, and the other was what we call operations toil - manual, repetitive work that everyone had to simply deal with because it was a part of the job.
Historically we tried to share the operations toil load by relying on individuals to independently decide which one of the two tasks they wanted to spend their time on. However, this self-delegation led to an uneven distribution of the workload. Some people were working more than 51% of their time on operations toil, while others were taking on less and less of the work. This caused negativity on both sides because those doing more were underwater and no longer perceived their work in a positive light, whereas those doing less weren’t able to effectively assist because they saw the operations toil coming at them at random times.
Another issue we initially faced as a team was that there were too many people working on the same thing. With more people, communication started breaking down. Both of these affairs led to deterioration of team morale, decline in functionality of the work, and an overall mood shift from excited to unhappy.
I attempted to tackle both problems at the same time and thankfully a single solution worked for them both. I sat down, opened up my Google sheets, and started to devise a rotation. I took the total number of individuals, determined an optimal number for the amount of people I wanted to have on each project, and created a rotating schedule. On one side, I made it very obvious when someone was to dedicate their time to high-value work, and on the other side I allocated specifically for operations toil. Furthermore, the rotation clearly outlined how much time they should spend on each. With a configured number of 20% for operations toil, individuals would spend 1 week on this work and then 4 weeks of uninterrupted time on high-value work.
It took a few months for people to get used to the new system. They had to go through the rotation a couple of times to appreciate how it worked. Though, after some time it had become so incredibly successful that we ended up with some favorable unintended consequences. For example, although the rotation allowed individuals to focus on their own work, they were also forced to work with others. People were able to concentrate and learn to work on their own while also learning to work with a group.
Another side effect was greater autonomy from every individual on the team. This came about because the team had to now think about how to schedule their high-value work. For instance, in our 5-week rotation one knows that the last of those weeks will be spent doing toil work so perhaps deployment shouldn't be scheduled in week four of the cycle knowing that the next week there won’t be time to fix it. Maybe another week is more appropriate.
The rotation had a final aftereffect - accountability. The handing off of operations toil required people to share and coordinate their work with another person. The expectation was that you had to communicate what you did so that no one had to come back to you with questions. The next person should be able to continue the work on their own. This meant that if something didn’t get done by the end of the week then you would have to explain yourself and why it happened.
- When I was creating the rotation I focused mostly on those people who absorbed too much of the work. Those people that like to solve issues. They are the finishers. But in a large-scale system environment something is always going to be broken and it shouldn’t be their job to fix everything. It’s not sustainable. So it was important for me to look at the team, identify these people, and set boundaries.
- 20% was our time-bound number for operations toil work. Those who were previously doing less admitted that it logically made sense. Those who were dedicating more than 51% of their bandwidth to this task were affected more by the change. That’s because they needed to change the habit of constantly switching back and forth between tasks as well as trying to decide which work to prioritize. They could now rely on the schedule to tell them what to do. It was an anchor in time and provided them with control. This significantly reduced stress and brought peace of mind to the team.
- As the team grows the intervals between toils are going to grow too. Interestingly enough, we had gotten to a point where the intervals were too far apart. The team came to me and said that at X-weeks they were losing the habit of doing the work. So we had to keep the intervals under a specific value.
- The backlog of issues and incidents that needed to be resolved significantly improved because suddenly there was a cadence around how to address them. Plus, people no longer thought of the work as mundane and repetitive, they were coming in fresh and invigorated.
- Overall, the team became stronger. They didn’t just develop together, they also did operations together. Together they built long-term thinking, designing, and creating while also concurring a backlog of items that needed to be fixed. The team worked together and went to the battlefield together.
- Special circumstances arise where people are working overtime on operations toil. Those are exceptions to the rule. But the moment you make exceptions the new norm, then you lose control of the schedule, the work, and individuals. Don’t allow people to take on too much of the toil, they’ll end up destroying themselves. They become slow, nervous, and start having mood swings. They won’t enjoy their work anymore and that’s not the environment you want to create in your org.
Shridharan Muthu, VP of Engineering at Zoosk, speaks of the time his company was acquired by another org in a time zone half a world away, listing issues and providing solutions to how he overcame the time disparity while transferring product knowledge.
VP of Engineering, Backend Applications at Zoosk
Pierre Bergamin, VP of Engineering at Assignar, shares how he overcame the overwhelm of having too many direct reports while at his previous job and how that helped him step back from day-to-day responsibilities and become more strategically oriented.
VP of Engineering at Assignar
Krzysztof Zmudzinski, Director of Engineering at Egnyte, shares a detailed list of recommendations on how to hire independent contractors and external vendors and get a pair of extra hands without regretting it down the road.
Director of Engineering at Egnyte, Inc.
Stefan Gruber, VP of Engineering at Bitmovin, describes when he decided to introduce Scrum into his organization and the moment he realized that tech items were left off the priority list for engineering.
VP of Engineering at Bitmovin
Stefan Gruber, VP of Engineering at Bitmovin, shares his experience joining a team that was not working well together, and the steps he took to build trustful and social relationships that would ultimately lead to increased engagement and productivity.
VP of Engineering at Bitmovin
You're a great engineer.
Become a great engineering leader.
Plato (platohq.com) is the world's biggest mentorship platform for engineering managers & product managers. We've curated a community of mentors who are the tech industry's best engineering & product leaders from companies like Facebook, Lyft, Slack, Airbnb, Gusto, and more.