Building a Platform Team
Senior Director of Engineering at Lob
When we started out, we were only five engineers, all on the same team. We were all full-stack developers, wearing many hats and doing a variety of things. As the company grew, and more people were joining in, the inherent division between product and platform engineering started to appear.
It is natural for teams to specialize at some point, and the first and fundamental distinction is between the product and platform. The highest level of distinction is that the product team is building new things, and they are the ones pushing things forward, while the platform team rushes nowhere; it is committed to making other teams faster and more reliable.
First off, you should identify when it is the right time to do the split. Building a platform team starts with finding an inflection point for building the two teams. There is probably no much need for a separate platform team in the earliest days, and it will likely be premature to start it. In my experience, that moment became obvious as the value of spending engineering cycles to optimize what we were doing became more apparent. For us, that was also strongly connected to the size of the team, and we wondered if we could operate as a single unit with already 15 engineers. We were still a fairly small team, but there was much emphasis on efficiency that made sense for us to do the split.
The split often coincides with the moment of finding a product-market fit. If you still haven’t figured out what you should be doing as a company, you probably don’t need two teams, and you should first find your product-market fit.
Platform teams are built differently in different companies. If there is a group of engineers with a keen interest in the platform work, formalizing the team will be undemanding; otherwise, you will have to start hiring people with matching competencies. This is exactly what I am currently doing at my new company; I am hiring people externally to build the platform team from the ground up.
Again, the specific profile will differ from one company to another. At my current company, I first hired a data engineer because we are heavily data-driven, and this was our most immediate need. That being said, you need to identify what specific initiatives you will be working on in terms of efficiency and optimization and hire for those.
In small companies, platform teams are mostly flat with people wearing many hats. In my current company, all platform engineers are titled software engineers, and there is no hierarchy. However, as the mandate becomes more specialized, subteams will naturally emerge. Larger platform teams typically apply more rigid structures with subteams, managers, and specific domains like observability or site reliability. I personally prefer to organize teams according to North Star that directly correlates with metrics. Perhaps, that is not a model to start with, but in the longer run, I would like to structure the team in a manner that would allow them to quickly quantify the progress they are making toward the key metrics. For example, site reliability is a domain we care a lot about, and I would organize a team around it. Then, we should be looking at the site uptime and have a specific SLA or SLO that would help the team track its progress.
Another significant difference that impacts the structure is that the product engineering team works on a shorter time horizon. In the early days, startups tend to iterate fast and have a short time horizon (unless they have a well-defined product and longer-looking product goals). Platform engineering by design works on a longer time horizon and focuses on what it could do irrespective of near-term deadlines. It often entails a systemic change or large refactor that would pay off in terms of velocity and efficiency, but not immediately.
Finally, there is a specific set of metrics -- DORA metrics -- specific to DevOps but could serve as an interesting guideline for a broad set of engineering teams. More specifically, DORA includes four key metrics: Deployment Frequency (DF), Mean Lead Time for changes (MLT), Mean Time To Recover (MTTR) and Change Failure Rate (CFR). If you would, for example, be able to establish that there is a bottleneck between a moment when a feature was finished and when it was shipped into production, by fixing only that component, the overall end-to-end process will be more efficient.
Don’t split the team too soon, but don’t wait for too long either. I still believe that in my previous job, we should have done it a bit sooner. One of the things I am genuinely excited about at my current job is that we are splitting up the team just before I think we need to split them, and that is going to end up being quite advantageous. If you can time it, try to do it just before you need to. You want to err on the side of doing it a bit sooner rather than later.
Be notified about next articles from Max Countryman
Senior Director of Engineering at Lob
Connect and Learn with the Best Eng Leaders
We will send you a weekly newsletter with new mentors, circles, peer groups, content, webinars,bounties and free events.