Preparing, Transitioning, and Restructuring into a Netflix Model
8 May, 2019
I was leading an organization of 150 people when we decided to transition our engineering teams from traditional software development practices to follow more of a Netflix model. A Netflix model focuses on engineering teams owning their services end-to-end including writing code as well as managing services while in production. Thus, engineers are responsible for operations time, ability, performance, and other metrics like adoption. The success of this model is demonstrated in its continuous delivery and deployment practices. Therefore, instead of deploying once a month or several times a day- as is the case with a more traditional model- the Netflix model allows an organization to ship to production over six thousand times per day. So, how did I prepare and transition my engineering organization for this major change?
There were multiple layers of action and alignment that I had to achieve. In order to do so I crafted a manifesto. This outlined the transition from a traditional software development model to the Netflix-style decentralized operations, engineering self-service ownership model. I specified the advantages of the new model: agility, speed, and innovation, and presented it to various sectors in the organization.
- First, I needed executive alignment. I wanted to make organizational changes including decentralizing the operations. But I couldn't decentralize quickly because operations still maintained the infrastructure and were embedded in the process. Thus, by creating the manifesto and presenting it to the executives to get buy-in, I started socializing the idea and benefits of shifting models so that the migration wasn't simply a top-down order but instead I wanted to use it to empower developers.
- Next, I had to get product alignment from the product organization. This is because if I was going to make changes in engineering, it was obviously going to affect product. I needed product buy-in so that I could allocate their resources. Additionally, the engineering organizational changes I was going to make meant that things were going to slow down for a period of time for product. Why? Because the engineers needed time to retool and reskill. Further still, there was a chance that due to the restructuring there might be some churn which would also slow things down. After getting buy-in and alignment from product and executive, the next step was to figure out how I was going to transition my traditional operations team to support my manifesto and strategy to empower engineers. To do so, I formed a platform engineering organization that was chartered with enabling the rest of the engineering organization to get to the new model. I did this by talking to my central operations organization, finding out who were the champions- those who wanted things fixed yesterday, and transferring those software developers along with devops engineers into the central platform team. I maintained a balance with operations that was needed to create the platform engineering organization and chartered them with developing a centralized tooling that would empower developers. I also needed a strong leader for the operations organization who understood the manifesto and was in alignment with the shift that was happening. I essentially enabled this person to build a platform organization within engineering. Now that I had the end-to-end service ownership model defined, buy-in from executive and product, and a platform team that was responsible for building out the tooling, I then slowly decentralized operations by implementing a product release cycle for deployment of the new delivery platform. Accordingly, I defined a new development cycle that aligned with my new model. The way I did this was by following a product process where I started treating our own engineering organization as customers. That is to say that I sent out a survey, analyzed the results, figured out common patterns from the results about current pain points that prevented them from shipping higher quality software at a faster rate, and then presented back the information to the engineering organization. When summarizing the results, I did it in a way that mapped the capabilities of the new software delivery platform and the new model that I was trying to promote. Basically, I said here are the identified problems and pain points, here is what is happening, I have a proposed plan to address these issues and here is the new model and its solution. I essentially did customer discovery with feedback but within the organization. Using this method, I started bringing alignment around the solution and it became obvious why I wanted to move to the new model, rather than forcing it upon everybody. The survey also served other purposes besides giving me a roadmap of what to tackle first and in which sequence. It also identified the cohorts of people within the larger organization as well as the applications that were most problematic. Lastly, after gathering information from the surveys and having a platform organization team established, I did a prototype of this new platform that addressed X, Y, and Z. We had the prototype and we ran an alpha test with one team and one application. We measured the before and after metrics. Essentially, what I was doing was demonstrating to the organization that by having the team own their application end-to-end, being able to ship it with this tool or platform that is aligned with how Netflix is doing things, that the team was more efficient, innovative, and shipped more quickly than anybody else. Consequently, a well-deserved spotlight was lit upon the prototype team and eventually adoption of the new model became organic. People were naturally drawn and wanted to participate in this successful new model. In the end, 98% of the services at my organization were migrated to the new Netflix model. There is still some legacy leftover from before the transition that we chose to leave alone. This is because it didn't make sense to fit them in the new model. But otherwise the transitional shift from traditional to new model was a success.
- This transition was a really big shift. A mindset shift, a tooling shift, and a shift in skill sets. As a consequence, you are going to get a lot of resistance from people because people don't like change. Focus on alignment so that you can move your organization forward in the process.
- When transitioning, you have to run both the central operations organization and the platform engineering organization parallel, maintaining your existing operations team as you empower engineers. You do not want to come in and rattle the nest or threaten the organization.
- Getting key people, like champions and a strong leader, into the platform organization is very important. They are the pioneers who show others the way.
- You have to ask your team to define the problems. You would be surprised that a lot of organizations fail to ask this question. Everybody thinks they know what the problem is and they start trying to fix it, however, they don't really understand the underlying issues. Surveying your team allows you to dig into problems and common patterns that you can then read back to the rest of the engineering organization.
- The key component to driving adoption within an organization is having a lot of transparency. For example, the launch and learn method. Brown bag. Or what I did was have the central platform team open up the sprint demos and show the rest of the team what was going on.
Peter Fedorocko, Director of Engineering at Workday, discusses if a manager should keep his skip-level one-on-ones and describes how he introduced the Open Doors instead.
Director of Engineering at Workday
Lloyd Holman, Head of Engineering at By Miles, explains why documentation is essential for any company to achieve excellence, particularly underlining its importance in onboarding new engineers.
Head Of Engineering at By Miles
Arun Krishnaswamy, Director of Data Science at Workday, elaborates on how he approached a single point of failure problem while sharing three key tips (or guardrails) on how to prevent it.
Director at Workday
Alex Litvak, Engineering Manager II at Uber, explains how he adjusted Spotify’s squad health check to enhance his team’s engineering quality.
Engineering Manager II at Uber
Andrew First, Co-Founder and Chief Technologist at Leanplum, shares how with a focused effort his company succeeded in reducing cloud costs by more than 60 percent in only six months.
Co-founder & Chief Technologist at Leanplum
You're a great engineer.
Become a great engineering leader.
Plato (platohq.com) is the world's biggest mentorship platform for engineering managers & product managers. We've curated a community of mentors who are the tech industry's best engineering & product leaders from companies like Facebook, Lyft, Slack, Airbnb, Gusto, and more.