API Migration: A Story of Confidence, Clear Goal, and Focus
3 August, 2020
Our API services were running in Google App Engine, which presented issues for us including cost, reliability, and architectural limitations. Migrating to Kubernetes would save us over $1M per year in infrastructure costs, and Kubernetes would allow us to unlock a lot of future innovations in our architecture. In addition, our service was a monolith and we wanted to start rearchitecting into microservices. However, our API migration to Kubernetes was taking over two years after facing multiple delays and setbacks. The leadership doubted that we could ever complete it and were considering canceling the project.
About the time I got involved, I was loosely familiar with the project. I quickly assessed the current state of the project at that time and it made me confident that we could successfully complete it. I time-framed the project and convinced the leadership that we could complete it, even with a much smaller team. We ended up being immensely successful, but not without encountering many challenges along the way.
When I became involved in the project, we had several engineers working on it, and given the high development cost, we could not justify continuing on. Instead of canceling the project, I strongly advocated for two other engineers to remain on the project with my close leadership.
To further convince the leadership that we would complete the project, I made myself personally accountable. As a co-founder, I stepped up, assuming responsibility and putting my name on the project. I embraced the disadvantageous situation to show the team my personal commitment -- I signed up as an executive sponsor for this project that ramped up both visibility and accountability.
Then I focused entirely on improving the morale and confidence of the team, especially since they were just hit with layoffs. However, I felt the project had been overstaffed and not run efficiently, so I felt confident in the now smaller team.
I set clear and precise milestones to break through the required complexity, dependencies, and scale. Each milestone was designed to be achievable in a relatively short amount of time and delivering some amount of value. The milestones were established to ensure the gradual migration from where we were at that moment to the final 100 percent of traffic and dismantling of the migration infrastructure.
When I joined, we had migrated less than one percent of traffic to the new service -- mostly from low-risk and churned customers. Then, I did something risky to instill confidence in the team -- I had spiked 20 percent of our traffic to the servers as an early on proof that we could get to 100 percent. When we were at one percent, it was hard to imagine the end of the project.
I was also encouraging engineers to take risks that were measured and well-thought-out. During each stage of rollout, we would be looking at dashboards and if anything would go wrong we could roll back in a few minutes, as we could tolerate a few minutes of downtime within our service’s error budget, as defined by the SLAs we provide to our customers. With that mindset, we didn’t incur any significant downtime. But there was so much more to be done. However, this great leap forward established confidence early on.
Then I went through the backlog in Jira that was overcrowded with more than 100 different, random tasks. To move forward I had to clean up the backlog and cut it dramatically. I went through each task with the team, one by one. Oftentimes people couldn’t even remember what it was about from reading the description. I did a massive pruning reducing items in Jira from dozens down to five relevant items for the next milestone to streamline the team’s focus. This additionally built confidence because many items in Jira were perplexing and unachievable, and more importantly, not needed.
To track, but also to give visibility to and celebrate our progress, we made a progress bar from a huge sheet of paper and posted it on the wall. Every time we would increase the traffic we would take a team’s photo with a Polaroid and add it to our progress bar. This also added to confidence. We moved from being swamped with unknowns to the phase where our progress was visible, milestones precise, and tasks clear and achievable. It gave us a sense of accomplishment and stirred inspiration, but also had shown the rest of the company that this was an important project and that we were making significant progress.
We continued with the measured risk approach. Three percent of our API traffic came from our largest customer. I argued that it was far less risky to migrate one large customer than a hundred small ones. If there was an issue, we would have to sort it out with one customer rather than a hundred customers. Then, after migrating 20 percent of our traffic, we decided to migrate our largest customers.
I continuously communicated our progress and results to the rest of the company. As a part of the executive team meetings, I would give updates on the project providing more transparency and spurring confidence in the best possible outcome by vouching for it personally. I understood that failing to complete this project posed an existential threat to the company, and I had to deliver unconditionally. Needless to say, every month we delayed the project would cost us $100,000.
I emanated confidence and that reflected profoundly on others. The two other engineers on the team worked in the evenings and weekends without anyone asking them to do so. They must have seen that it was important enough for me to stay late, and they did the same. When I sent an email out to the whole company presenting our accomplishments and challenges, I used that as an opportunity to publicly give them credit for their hard work and commitment.
- When the confidence and morale of the team are low, the first thing to do is to rebuild them. Any large project like migration or refactoring seems daunting at the beginning and past, unsuccessful attempts contribute to the increased lack of confidence. Therefore, it is not enough to only show the value of the project; you have to convince everyone around you that you could do it, and furthermore that you will be successful in doing it.
- Accountability starts with you! Do-whatever-it-takes mindset is also contagious since your determination and passion instill confidence in people and eagerness to not only believe you but to follow you as well.
- Every big thing can be broken down into smaller parts. Every large project can be broken down into achievable and measurable milestones and tasks that could be followed step by step.
- Have a proof of concept. In this particular case, I joined after the project was (unsuccessfully) running for two years, so I have decided to use our first milestone (migrating 20 percent of our traffic) to demonstrate the feasibility of the whole project. For new projects, proofs of concept would include a skeleton of a service, load testing, and precise benchmarks that prove that you are on the right track.
- Have a clear measure of value. Large, lasting projects, especially projects addressing tech debt, are not always easy to measure in terms of value. However, due to the shaken confidence, we had to translate the value of the project into dollars and calculate the exact benefit to the company. Stating value in numbers makes the strongest case for any project.
Scale your coaching effort for your engineering and product teams
Develop yourself to become a stronger engineering / product leader
Vadim Antonov, Engineering Manager at Meta, details his process of implementing an organized execution system for his cross-functional team.
Engineering Manager at Facebook
William Bajzek, Director of Engineering at Sapphire Digital, remembers the first time that he needed to make the ultimate sacrifice on behalf of the well-being of his team.
Director of Engineering at Sapphire Digital
Adi Purwanto Sujarwadi, VP of Product at Evermos, shares how he identified the symptoms of his overworked product team and worked towards defining conflicting priorities.
Adi Purwanto Sujarwadi
VP of Product at Evermos
James Engelbert, Head of Product at BT, shares his deep understanding of the traits of a successful product manager and how to get aligned with the organization’s path to success.
Head of Product at BT
Piyush Dubey, Senior Software Engineer at Microsoft, shares his journey of climbing up the career ladder through awkward times dealing with an introverted manager.
Senior Software Engineer at Microsoft
You're a great engineer.
Become a great engineering leader.
Plato (platohq.com) is the world's biggest mentorship platform for engineering managers & product managers. We've curated a community of mentors who are the tech industry's best engineering & product leaders from companies like Facebook, Lyft, Slack, Airbnb, Gusto, and more.