How to Approach Service Refactoring
Problem
Tech debt is an inevitable consequence of shipping features at a rapid rate. Ideally, every once in a while you would have an opportunity to improve on things and remove bottlenecks that are a direct consequence of many corners cut in order to ship features fast. How do you identify where you should spend your time and energy during this limited (time- and resources-wise) opportunity?
At a past company, our group was given a fixed % of engineering manpower to improve the overall platform. We decided to give each team the freedom to allocate 1-2 engineers on what they considered their top tech-debt items, where most of them involved “Moving a specific domain away from our monolith to its own dedicated service”.
Even with some planning and oversight, we realized that most of the teams either underestimated the effort of such a refactor, or played a safe bet and shipped a refactor that didn’t have a real impact. After a few months of work, leadership decided that our initiative was no longer worth pursuing and investing in. Essentially, the idea to spread our efforts and make everyone happy at the end didn’t produce the desired outcome.
Actions taken
We identified two approaches; the first one of dispersing the effort was described above and it was our starting point. However, its impact was not satisfactory. Even with a little bandwidth to improve things, most of our engineers never worked on a service refactoring project, never did big data migrations and were afraid to make bold moves. Eventually, most of them chose to stick with easy and small improvements that had an insignificant impact. That led to a vicious circle -- low impact, no support from leadership, no possibilities for continuations, and never got to a big impact refactor.
So, we decided to go another route.
As a group, we decided to identify a few top initiatives across all business units and then streamline our focus on the top three to five that were the most important to propel our platform to the next level. That meant pruning our -- obviously outdated -- list with thousand tech debt items we should have been working on to only a few top objectives per quarter.
We decided to team up, using
the SWAT team model and brought engineers from across different teams, and paired them with a few engineers who were part of the impacted domain. Teaming up with people from other teams was very valuable, because many of those SWAT engineers had their own “special skill”: infrastructure & Kubernetes, data migration, security, asynchronous programming, etc.
Service refactors are also heavily impacted by engineer departure, even though the SWAT strategy proved to be more resilient. During our first wave of refactors, engineers would often work alone, which would demotivate them, and sometimes their departure caused the refactoring project to be deprioritized. To overcome this challenge and prevent a potential knowledge gap, I would favor a core team that would continuously work on the top engineering initiatives, but supported by a few more members from the impacted domain.
Lessons learned
- I could never find enough different strategies that I could apply when approaching service or architecture refactoring. Try whatever you think will work for you in the existing circumstances. Bear in mind that it is not only a technology problem but a broader organizational problem. Go with one approach and if after three or six months it doesn’t bring the desired results, don’t be afraid to try something new. If you wait too long to change strategy, engineers and leadership could become demotivated by the lack of progress.
- Map all dependencies beforehand. Often the developer loop is not fast enough and you will not have enough velocity to even start working on some initiatives. Creating tools that would allow us to complete refactors in a reasonable amount of time was a prerequisite of any refactoring-related action. Some of them could be built “in-flight”, but we found out it was not always the case.
- Team chemistry is beyond important. If you have a core team already working well together, you should avoid breaking it when starting the next refactor. Instead, go for a stable core team and then augment it with few more people with domain skills.
Be notified about next articles from Pierre-Alexandre Lacerte
Connect and Learn with the Best Eng Leaders
We will send you a weekly newsletter with new mentors, circles, peer groups, content, webinars,bounties and free events.