We've just launched plato for individuals

🔥

login


Google Sign inLinkedIn Sign in

Don't have an account? 

Using Tiger Teams to Solve Problems and Create an Environment for Learning

Team reaction
Collaboration
Reorganization
Impact

24 June, 2020

Arjun Rao is the Director of Engineering at Place Exchange. Here, he talks about how he assembled a tiger team, which is a specialized, cross-functional team brought together to fix or investigate a critical issue, and how that team helped resolve a specific problem within a short period of time.

Problem

Place Exchange is making Out of Home advertising truly programmatic. As a result, a lot of our bread and butter is processing transactions and serving ads. It is critical in nature and therefore expects very high up-times, so that we can continuously receive and process requests to serve ads on out-of-home screens.
 

Over the back half of last year, there was a heavy increase in traffic coming our way. We noticed that one of our services critical to ad serving began to feel the load. This didn’t cause any SLA-related issues, but were severe enough for me to decide it was a problem worth solving, so as to not run into it in the future.
 

Actions taken

Investigation
In the time between the two traffic spikes in September, before the tiger team was put together, we poked around to see whether or not there were small steps we could take to solve the problem or if it required a more focused effort. This exercise helped us understand that the root cause of the problem was increased strain on the database causing performance issues.
 

Critical Next Steps
Tuning database performance, either by optimizing SQL queries or tinkering with database parameters, is a hard problem and not something everyone is familiar with. To tackle that, I assembled a tiger team composed of four senior members of the staff. By the beginning of August, these engineers knew what the stack looked like and put their heads together to figure out how to solve the problem.
 

When you want to solve such a complex problem within a short period of time in order to gain actionable wins, you need to ensure certain key aspects are in place. One of them was to create clear objectives. For a three week window to solve this issue, we came up with three objectives -
 

  • How do we improve instrumentation in our stack so that we can have some metrics in place to better understand what causes these issues to surface?
  • Are there short term fixes we can implement to alleviate the situation?
  • What long term proposals should we present? Once the three week period is over, this would help the teams to work on stabilizing and scaling out the system.
     

Maintaining transparent communications was another key ingredient. We used Slack for asynchronous conversations, and had a formal, once-a-week reconvene by video conference.
 

In conjunction with our RCA (root cause analysis) observations after the two high traffic spikes, we had few action items to take away. During the tiger team kickoff, in order to come up with an actionable list of items the team could work on -
 

  • We stated everything we knew; clustering patterns together and seeing where we could get some quick wins.
  • We came up with a list of metrics to be created/updated, to provide both early warning indications, as well as better insight into system behavior.
  • We identified several other tasks to be tackled by this tiger team, with the end goal of seeing if our changes would result in observable differences.
     

Since that three week period, we haven’t seen any notable spikes correlated with our initial observations, and our scale since then has seen a consistent increase. So far so good, but we have also been implementing some of those long term proposals to keep the ball rolling forward.
 

Lessons learned

  • You want the people who are solving the problem to be focused on what they are doing without distractions. Being able to free them up and have them direct their attention on a new project that came out of nowhere, can be complicated.
  • You need to make sure you have the right people to solve the project in an urgent manner. As you run your hiring process, you should ask yourself, ‘Do we have enough bench strength across the team, in the event we come across that we might not be able to foresee?’ It’s helpful to have organizational slack for the flexibility to pull people to solve critical problems. If the teams are being run too thin, then you can’t switch your team’s focus, as it will be very disruptive and will result in being counterproductive to the business goals. Your team needs to understand why it’s an important problem to solve and that there is enough breadth in the team to handle this, in addition to scheduled work. It was very cool to see our own team come together like that.
  • At Place Exchange Engineering, we follow a “pull policy", where people pull whatever work they would do next. Pushing work down to people, in a way where they feel like they have to do it, does not create the right incentives.
  • If it’s a timeboxed effort, which has a deadline, it is good to set very clear expectations and objectives from the get-go. Set some lightweight processes such as a Slack channel or some other medium, to post and share updates, both within the team and to other stakeholders, all through the process.
  • In general, we make an effort to distribute findings throughout the engineering organization, so that the same people don’t get tapped when similar situations come up the next time. We create an environment of learning so that our teams are able to keep iterating and improving their systems. Disseminating context about why we created the tiger team and what outcomes it created for everybody else, was a useful exercise in teamwide awareness as well as preparing ourselves for the future.

Related stories

Migrating a Legacy Service: A Lesson in Eng/Ops Collaboration
18 November

Jackson Dowell, Engineering Manager at Asana, discusses how he approached legacy service migration by stabilizing the existing stack and getting Engineering and Operations to work together to address the underlying problems.

Collaboration
Jackson Dowell

Jackson Dowell

Engineering Manager at Asana

How Skip-Level Conversations Can Help You Create Strong Relationships with Your Engineers
31 October

Toby Delamore, Product Manager at Xero, explains how skip-level conversations and maintaining an attentive relationship with engineers enabled him to keep a finger on the pulse of the team.

Internal Communication
Feedback
Meetings
Collaboration
Toby Delamore

Toby Delamore

Product Manager at Xero

Are We Solving the “Right Problems”?
13 November

Michael Vanhoutte, Chief Architect at Aprimo, discusses why it is important to establish that a problem you are trying to solve is the right one and how the engineering obsession with ‘fixing the problems’ can lead us astray.

Impact
Productivity
Prioritization
Michael Vanhoutte

Michael Vanhoutte

Chief Architect at Aprimo

How to Improve Collaboration Between Different Stakeholders
30 October

Patrick Chen, Lead Product Manager at Next Insurance, explains how he managed to improve collaboration between Product and other stakeholders by applying a holistic and inclusive approach.

Collaboration
Patrick Chen

Patrick Chen

Lead PM at Next Insurance

Why AB Testing Is Good for Your Company Culture
28 October

Graham McNicoll, CEO of Growth Book and former VP of Product and Engineering at Education.com, recalls how he introduced data-driven decision-making and AB testing to drive growth, and the unexpected effects it had on development speed, team alignment, intellectual humility, and collaboration.

Team processes
Managing Expectations
Agile / Scrum
Collaboration
Graham McNicoll

Graham McNicoll

VP Product and Engineering at Education.com

You're a great engineer.
Become a great engineering leader.

Plato (platohq.com) is the world's biggest mentorship platform for engineering managers & product managers. We've curated a community of mentors who are the tech industry's best engineering & product leaders from companies like Facebook, Lyft, Slack, Airbnb, Gusto, and more.