How to Approach Service Refactoring
26 July, 2020
Tech debt is an inevitable consequence of shipping features at a rapid rate. Ideally, every once in a while you would have an opportunity to improve on things and remove bottlenecks that are a direct consequence of many corners cut in order to ship features fast. How do you identify where you should spend your time and energy during this limited (time- and resources-wise) opportunity?
At a past company, our group was given a fixed % of engineering manpower to improve the overall platform. We decided to give each team the freedom to allocate 1-2 engineers on what they considered their top tech-debt items, where most of them involved “Moving a specific domain away from our monolith to its own dedicated service”.
Even with some planning and oversight, we realized that most of the teams either underestimated the effort of such a refactor, or played a safe bet and shipped a refactor that didn’t have a real impact. After a few months of work, leadership decided that our initiative was no longer worth pursuing and investing in. Essentially, the idea to spread our efforts and make everyone happy at the end didn’t produce the desired outcome.
We identified two approaches; the first one of dispersing the effort was described above and it was our starting point. However, its impact was not satisfactory. Even with a little bandwidth to improve things, most of our engineers never worked on a service refactoring project, never did big data migrations and were afraid to make bold moves. Eventually, most of them chose to stick with easy and small improvements that had an insignificant impact. That led to a vicious circle -- low impact, no support from leadership, no possibilities for continuations, and never got to a big impact refactor.
So, we decided to go another route.
As a group, we decided to identify a few top initiatives across all business units and then streamline our focus on the top three to five that were the most important to propel our platform to the next level. That meant pruning our -- obviously outdated -- list with thousand tech debt items we should have been working on to only a few top objectives per quarter.
We decided to team up, using
the SWAT team model and brought engineers from across different teams, and paired them with a few engineers who were part of the impacted domain. Teaming up with people from other teams was very valuable, because many of those SWAT engineers had their own “special skill”: infrastructure & Kubernetes, data migration, security, asynchronous programming, etc.
Service refactors are also heavily impacted by engineer departure, even though the SWAT strategy proved to be more resilient. During our first wave of refactors, engineers would often work alone, which would demotivate them, and sometimes their departure caused the refactoring project to be deprioritized. To overcome this challenge and prevent a potential knowledge gap, I would favor a core team that would continuously work on the top engineering initiatives, but supported by a few more members from the impacted domain.
- I could never find enough different strategies that I could apply when approaching service or architecture refactoring. Try whatever you think will work for you in the existing circumstances. Bear in mind that it is not only a technology problem but a broader organizational problem. Go with one approach and if after three or six months it doesn’t bring the desired results, don’t be afraid to try something new. If you wait too long to change strategy, engineers and leadership could become demotivated by the lack of progress.
- Map all dependencies beforehand. Often the developer loop is not fast enough and you will not have enough velocity to even start working on some initiatives. Creating tools that would allow us to complete refactors in a reasonable amount of time was a prerequisite of any refactoring-related action. Some of them could be built “in-flight”, but we found out it was not always the case.
- Team chemistry is beyond important. If you have a core team already working well together, you should avoid breaking it when starting the next refactor. Instead, go for a stable core team and then augment it with few more people with domain skills.
Pascal Rodriguez, Director of Engineering at Bestmile, explains why taking care of tech debt is important and what it takes you away from.
Director of Engineering at Bestmile
Pierre-Alexandre Lacerte, Senior Principal Engineer at Upgrade, Inc., discusses two approaches of service refactoring explaining why he would choose one over the other.
Senior Principal Engineer at Upgrade inc
Pierre-Alexandre Lacerte, Senior Principal Engineer at Upgrade, Inc., explains why it is important to simulate failures and how game days benefited his organization.
Senior Principal Engineer at Upgrade inc
Andrew First, Co-Founder and Chief Technologist at Leanplum, discusses how he managed to complete a large infrastructure project by instilling confidence in his team, setting precise benchmarks, and streamlining his focus on what really mattered.
Co-founder & Chief Technologist at Leanplum
Pierre Bergamin, VP of Engineering at Assignar, outlines some useful tips for decoupling releases from deployment and increasing deployments by a huge factor, speeding up reverts and planning releases in a better way.
VP of Engineering at Assignar
You're a great engineer.
Become a great engineering leader.
Plato (platohq.com) is the world's biggest mentorship platform for engineering managers & product managers. We've curated a community of mentors who are the tech industry's best engineering & product leaders from companies like Facebook, Lyft, Slack, Airbnb, Gusto, and more.