Plato Elevate Winter Summit has been announced (Dec 7th-8th)

🔥

Back to resources

Building an Efficient Data Science Team While Still Being Agile

Data Team
Agile / Scrum

28 July, 2020

Arun Krishnaswamy
Arun Krishnaswamy

Director at Workday

Arun Krishnaswamy, Director of Data Science at Workday, describes how to build a data science team emphasizing the difference between software development lifecycle and data science methodology.

Problem

When the company as a whole follows a uniform process based on the software development lifecycle (SDLC), it interferes with and hinders the workflow process of implementing data science projects. An efficient data science team operates on its own methodology more conducive to the exploratory nature of data science.

I identified three key problems stemming from the efforts to enforce the SDLC to data science teams.

  • Timeboxing.
    Software development can be easily timeboxed -- with clearly defined deliverables you can work on one piece of software for two weeks as planned, but data science is all about unknowns and resists being time-bound.
  • Reviews.
    Conducting reviews in data science fundamentally differ from code reviews. If there is a problem with the software, a code review should detect it. The evaluation process for data science is entirely different since it includes not only data but the results and analysis of the process. Code review is just a small part of an overall evaluation process.
  • Incorporating SDLC best practice.
    While SDLC is not a perfect match for data science projects, it doesn’t mean that data science can’t benefit from many good SDCL practices.

Actions taken

Each of those problems requires a specific approach:

  • Timeboxing
    Machine learning data science has multiple components: the engineering (data exploration and interpretation), experimental (obtaining the training data and building the training model), and operational component(deployment into production). The engineering and operational components are highly deterministic and could use the Jira board. However, Jira is not suitable for the experimental component but combining Jira with the Kanban board is very practical because it is still Agile, but instead of timeboxing, it does boxing by projects. For example, I was exploring some data and I wanted to train the model. To track my progress I would use Kanban because of its time flexibility. It would still use the same terminology and is a part of the Jira ecosystem, but it naturally follows the way in which data science is done.

  • Code reviews
    Instead of code reviews, data science relies on analytic reviews. If someone is working on a model, their goal is not to produce a piece of code but to do a presentation in front of the team. An analytical review outlines data that have been used, features that have been developed, model parameters, and details of model evaluation. In data science, every problem could be divided into the supervised, semi-supervised, and unsupervised type and a template for each type along with evaluation metrics has to be created.
    For example, if I have a classification problem, I would be looking at confusion metrics, but if I have a regulation problem I would use accuracy metrics. Analytical reviews differ based on the type of problem. The code reviews are included as a part of the implementation of an algorithm.
    To motivate my team, I came up with the Champion Challenge Problem (aka Kaggle like ) that reflects the nature -- and difference to software development -- of data science. For any given problem in data science, you could try multiple approaches to solve it. Unlike software development where two people would be working on two different parts of the problem, due to the experimental nature of data science, two people would be paired up to work on the same problem but aiming to come up with different solutions. Analytical reviews are used to assess which one is better. This particular challenge encourages creativity and a competitive spirit.

  • Best practices
    Many good practices from SDLC could be integrated into data science. My favorite is peer programming when two people are working together to solve a problem. When translated to data science, it would undergo some modifications: a. you pair up a senior and junior data scientist and their relationship should resemble a mentoring relationship; b. a geographically distributed data team would allow for continuation and complementation due to the cascading nature of multiple steps data science projects consist of.

Lessons learned

  • Timeboxing data science projects and experiments is not a smart idea as it doesn’t follow the natural cycle of data science projects.
  • Do analytical instead of code reviews.
  • Some data scientists come from software development and are more cognizant of and willing to introduce best practices. Nevertheless, be open-minded and ready to learn from SDLC.

Discover Plato

Scale your coaching effort for your engineering and product teams
Develop yourself to become a stronger engineering / product leader


Related stories

Preparing Your Team for the Remote Workplace

29 November

Vadim Antonov, Engineering Manager at Meta, dictates how he brought a brand new team into the remote learning process by ramping up onboarding and creating a mentor system.

Alignment
Remote
Internal Communication
Coaching / Training / Mentorship
Data Team
Cross-Functional Collaboration
Vadim Antonov

Vadim Antonov

Engineering Manager at Facebook

How Data-Driven Products Help Customers and Increase Sales

11 November

Richard Maraschi, VP of Data Products & Insights at WarnerMedia, shares his insight on incorporating data science, AI, and product management to overcome slowing growth of the company.

Product
Conflict Solving
Users
Data Team
Performance
Richard Maraschi

Richard Maraschi

VP Data Product Management at WarnerMedia

How to Increase Engagement Using the Right Metrics

11 November

Richard Maraschi, VP of Data Products and Insights at WarnerMedia, speaks on how he increased user engagement using measurements and experimentation.

Alignment
Mission / Vision / Charter
Conflict Solving
Convincing
Team Processes
Data Team
Richard Maraschi

Richard Maraschi

VP Data Product Management at WarnerMedia

How to Build a Software Team from the Ground Up

12 November

Deepesh Makkar, Sr Director of Engineering at SunPower Corporation, shares his experience transitioning his organization from contractors to a 50/50 split of full-time employees and international vendors.

Hiring
Motivation
Cross-Functional Collaboration
Agile / Scrum
Deepesh Makkar

Deepesh Makkar

Sr Director of Engineering at SunPower Corporation

Choosing the Perfect Implementation of Agile

25 October

Shubhro Roy, Engineering Manager at Box, stresses the importance of the holistic nature of Agile methodology; picking and choosing Ă  la carte may cause more problems than it solves.

Goal Setting
New Manager
Agile / Scrum
Shubhro Roy

Shubhro Roy

Engineering Manager at box

You're a great engineer.
Become a great engineering leader.

Plato (platohq.com) is the world's biggest mentorship platform for engineering managers & product managers. We've curated a community of mentors who are the tech industry's best engineering & product leaders from companies like Facebook, Lyft, Slack, Airbnb, Gusto, and more.