Challenges Faced as the Vice President of Engineering
8 July, 2021
I have faced the same challenges in two different organizations. The challenge began with scaling an engineering team, where the size of the team grows from a single engineering team to multiple engineering teams. When running a typical Agile development process, the default kind of way to do it is to run two weeks sprints, scrum style with a backlog. Both in the beginning and at the end of the sprint, the quality assurance is done, and then the process is released.
When there is a large codebase and a bigger team, there is an enormous amount of change in that two-week iteration. Therefore, if the tooling was incorrect and processes in place, there would be a vast regression testing burden. This would spill the next sprint and most likely find ship-stopping regression bugs.
For example, there were four key roles:
- An individual software engineer, who is responsible for the agile software development process. They are assigned to do work, and after the work is done, they merge it into the codebase. It is one kind of test environment, and it is an ongoing process through the entire sprint.
- There were about 10 software engineers involved in the process, making changes, and pushing them into test environments.
- There was also a test team, a QA team, and some degree of automation. The test team would be notified that the given work is finished and then run their testing.
- There were managers, tech leads, or software engineering managers responsible for the scope of this. They were the ones to bring it together. Besides, the other infrastructure operations DevOPS teams were responsible for receiving the Code into the production environment and making sure that it is running correctly.
When the team is large enough, and the functional testing needs to run along the way, the problem arises when all the code that was in scope for release and not until all the engineers have completed their targeted work. The QA lead along with the team needed to ensure that the entire integrated product is ready to go to production. Problems with large groups are that the number of changes that happen were difficult to be confident about and whether the code was integrated and working correctly or not. After the end of the sprints, we made sure we had a decent regression test and then verified all the changes. Later we pushed this into production.
As the days passed by, if we did find a bug, we would send it back to the engineering team. In the meantime, we moved on and started working on the next set of features, but did not begin production. On the other hand, the QA team thought that they had a good build and pushed it into production. However, the data migration failed. We were not able to test that in an environment because we had to try the whole process and run another migration.
These were less of a regression testing problem and more of a stability problem. Every time we released a product, probably 40 - 60 percent of the time, there was an error. Hence, we had to stop everything and look at how we could fix things. If we had limited the scope of every change, it would be a bad idea. Releasing less frequently would make things more reliable. The ability to risk-averse by slowing changes down was the exact opposite of what we need to do. What was required to do was to go faster, which was weird. We had to take every individual change and release it to production as quickly as possible.
First things first, we had to break the team down into smaller groups with a focused plan. Plus, we made sure that each of those groups had good automation on the things they own and can release it to production when it was ready. It was tough to remove products in the middle of the process, which means something was not done right. They needed to do it at midnight when there are no window users. Significant changes were scary, and small changes were not.
We had to figure out how to start limiting the size of the changes as quickly as possible with the minimum number of changes to our processes and our tools to achieve our goal. Besides, we needed to break down the problem to release changes incrementally. The deployment pipeline must have the level of automation so that any engineer could push it to production. Here it means it is the step from integrating change to your source code and then into your main branch from the master branch.
We had to automate deployment to production and have a level of test coverage at the integration level. This change had been committed and reviewed. It needed to be done multiple times a day. In my first company, we did this twice a month and sometimes, if we were lucky, then five times a day.
A unique action that I had taken from a leadership perspective is leading people away from fear. Almost two years later, it dramatically changed the way we thought about things, we changed about success metrics and what the job was all about. The job was not just cheap shipping features; it was about our velocity, agility, and ability to scale the team.
Even though we did not have all the conditions to successfully push to production as quickly as possible with something we needed to do, we had to have perfect test coverage, automation, architect, and systems to achieve continuous integration and deployment. Also, we realized that there had to be ideal conditions. We need to have few things, but we must wait till the automation is perfect and everything else is exactly the way we want it to be.
- How you influence your team members as a leader says a lot about yourself. Changing people and organizations are tougher than changing technologies and patterns. One needs to be careful when dealing with “people.”
- Build credibility with your leadership team, your engineering managers, or senior engineers. Bring them all along with you on the journey. It is important to have an open mind how the best practices are evolving and what the newest opportunities to optimize are.
Scale your coaching effort for your engineering and product teams
Develop yourself to become a stronger engineering / product leader
Elwin Lau, Director of Software at Jana, advocates the importance of maintaining culture within a company when scaling teams.
Director of Software at JANA Corporation
Philip Gollucci, Director of Cloud Engineering at CareRev, describes a new method for hiring in a market climate that favors candidates instead of recruiters.
CEO/Founder at P6M7G8 Inc.
Tom Hill, Engineering Manager at Globality, Inc., shares how he works with a culturally diverse team based within a thirteen-hour time gap.
Engineering Manager at Torii
Sourabh Sahay, Engineering Manager at Meta, discusses how talent acquisition can be made more efficient by refining the hiring processes.
Engineering Manager at Meta (Facebook, Oculus, & Family of Apps)
Brad Jayakody outlines the roadmap to maintaining a healthy balance between technical debt and team growth. However, just as balancing acts go it is important to have a strong foundation.
Director of Engineering at Motorway
You're a great engineer.
Become a great engineering leader.
Plato (platohq.com) is the world's biggest mentorship platform for engineering managers & product managers. We've curated a community of mentors who are the tech industry's best engineering & product leaders from companies like Facebook, Lyft, Slack, Airbnb, Gusto, and more.