login


Google Sign inLinkedIn Sign in

Don't have an account? 

Building an Efficient Data Science Team While Still Being Agile

Agile / Scrum
Data team

28 July, 2020

Arun Krishnaswamy, Director of Data Science at Workday, describes how to build a data science team emphasizing the difference between software development lifecycle and data science methodology.

Problem

When the company as a whole follows a uniform process based on the software development lifecycle (SDLC), it interferes with and hinders the workflow process of implementing data science projects. An efficient data science team operates on its own methodology more conducive to the exploratory nature of data science.
 

I identified three key problems stemming from the efforts to enforce the SDLC to data science teams.
 

  • Timeboxing.
    Software development can be easily timeboxed -- with clearly defined deliverables you can work on one piece of software for two weeks as planned, but data science is all about unknowns and resists being time-bound.
  • Reviews.
    Conducting reviews in data science fundamentally differ from code reviews. If there is a problem with the software, a code review should detect it. The evaluation process for data science is entirely different since it includes not only data but the results and analysis of the process. Code review is just a small part of an overall evaluation process.
  • Incorporating SDLC best practice.
    While SDLC is not a perfect match for data science projects, it doesn’t mean that data science can’t benefit from many good SDCL practices.
     

Actions taken

Each of those problems requires a specific approach:

  • Timeboxing
    Machine learning data science has multiple components: the engineering (data exploration and interpretation), experimental (obtaining the training data and building the training model), and operational component(deployment into production). The engineering and operational components are highly deterministic and could use the Jira board. However, Jira is not suitable for the experimental component but combining Jira with the Kanban board is very practical because it is still Agile, but instead of timeboxing, it does boxing by projects. For example, I was exploring some data and I wanted to train the model. To track my progress I would use Kanban because of its time flexibility. It would still use the same terminology and is a part of the Jira ecosystem, but it naturally follows the way in which data science is done.

  • Code reviews
    Instead of code reviews, data science relies on analytic reviews. If someone is working on a model, their goal is not to produce a piece of code but to do a presentation in front of the team. An analytical review outlines data that have been used, features that have been developed, model parameters, and details of model evaluation. In data science, every problem could be divided into the supervised, semi-supervised, and unsupervised type and a template for each type along with evaluation metrics has to be created.
    For example, if I have a classification problem, I would be looking at confusion metrics, but if I have a regulation problem I would use accuracy metrics. Analytical reviews differ based on the type of problem. The code reviews are included as a part of the implementation of an algorithm.
    To motivate my team, I came up with the Champion Challenge Problem (aka Kaggle like ) that reflects the nature -- and difference to software development -- of data science. For any given problem in data science, you could try multiple approaches to solve it. Unlike software development where two people would be working on two different parts of the problem, due to the experimental nature of data science, two people would be paired up to work on the same problem but aiming to come up with different solutions. Analytical reviews are used to assess which one is better. This particular challenge encourages creativity and a competitive spirit.
     

  • Best practices
    Many good practices from SDLC could be integrated into data science. My favorite is peer programming when two people are working together to solve a problem. When translated to data science, it would undergo some modifications: a. you pair up a senior and junior data scientist and their relationship should resemble a mentoring relationship; b. a geographically distributed data team would allow for continuation and complementation due to the cascading nature of multiple steps data science projects consist of.
     

Lessons learned

  • Timeboxing data science projects and experiments is not a smart idea as it doesn’t follow the natural cycle of data science projects.
  • Do analytical instead of code reviews.
  • Some data scientists come from software development and are more cognizant of and willing to introduce best practices. Nevertheless, be open-minded and ready to learn from SDLC.

Related stories

Building an Efficient Data Science Team While Still Being Agile
28 July

Arun Krishnaswamy, Director of Data Science at Workday, describes how to build a data science team emphasizing the difference between software development lifecycle and data science methodology.

Agile / Scrum
Data team
Arun Krishnaswamy

Arun Krishnaswamy

Director at Workday

Use Design Sprints to Improve Your Product
17 July

Shyam Prabhakar, Engineering Manager at Stitch Fix, explains how design sprints helped him fix problems caused by the lack of sufficient research and overall improve his company’s products.

Agile / Scrum
Productivity
Shyam Prabhakar

Shyam Prabhakar

Engineering Manager at Stitch Fix

Handling a Mistake - Adopting a New Workflow
6 July

Shridharan Muthu, VP of Engineering at Zoosk, describes how he quickly agreed to adopt new workflows, a mistake he later regretted, and how he handled the situation by spending the time to course correct and taking a stab at making things easier for his team.

Team processes
Agile / Scrum
Collaboration
Shridharan Muthu

Shridharan Muthu

VP of Engineering, Backend Applications at Zoosk

Some Useful Tips for Decoupling Releases and Deployments
30 June

Pierre Bergamin, VP of Engineering at Assignar, outlines some useful tips for decoupling releases from deployment and increasing deployments by a huge factor, speeding up reverts and planning releases in a better way.

Agile / Scrum
Dev Processes
Pierre Bergamin

Pierre Bergamin

VP of Engineering at Assignar

Adjusting Your Working Style to Agile Practices and Mindset
15 June

Ben Coats, Solutions Architect at InfoArmor, recalls how he had to adjust his personal working style of non-linear thinking and coding, to the shifting priorities and incremental delivery of Agile development.

Productivity
Agile / Scrum
Ben Coats

Ben Coats

Solutions Architect / Principal Engineer at InfoArmor

You're a great engineer.
Become a great engineering leader.

Plato (platohq.com) is the world's biggest mentorship platform for engineering managers & product managers. We've curated a community of mentors who are the tech industry's best engineering & product leaders from companies like Facebook, Lyft, Slack, Airbnb, Gusto, and more.