Streamlining a Data Science Team
CTO and cofounder at Datamaran
Data science is much less predictable and less structured than software development. One can spend months analyzing data or doing exercises without any tangible result that can contribute to the product they are building. While large companies can afford to hire a great number of data scientists who will be exploring the scope and determining what is feasible, small startups have to be more cautious about their resources.
Data science is a new career, and people come from different professional backgrounds, many of whom have no experience with software development. They tend to build their models without understanding how what they are building will be making it to the products. To make our data science team successful, I had to help the team understand the business problems they are trying to solve and how their work contributes to it.
I decided to reframe the whole process by asking our data science team to build the most simple possible model. By starting small and building a simple visualization of results, we would be able to quickly come up with the first version, put it before users, and collect feedback. It would be the quickest way for us to learn what users are thinking about the data we are providing.
Applying Agile and quickly iterating based on users’ feedback enabled us to refine the model in no time. Developing an initial model, interface, and workflow and integrating them with the engineering part of the product should be the first step of the process. From there on, you can continue to work and iterate the model itself. If you come up with a better version of the model, you could just replace that part of the code.
However, data scientists should be trained to think about how their models could be embedded into the software. They have to understand that a data science project doesn’t consist of a model only, but also of people who would use it and the workflow of managing data built around it.
The model itself won’t fit the business model from the start, and the team should consider to not only deliver a model but also a workflow that will include a human in the loop to monitor and correct decisions that the model is not able (yet) to make correctly. This first iteration will then be the baseline for further improvements, lowering each time the supervision from the human operator.
- Help data scientists to think more like software developers and users at the same time. They should get outside of the comfort of building an analytical engine. Only once they start to think more like developers and users will they be able to build better models.
- Start small and get quick feedback. You will learn a lot by interacting with users, collecting their feedback, and iterating on it. To better understand what users want, start with something simple, not something equivalent to Alexa. For example, do a simple chatbox applying a Lean Startup logic and getting something out fast.
- Coach people, especially juniors, to get out of their comfort zone. Ask them not to only think like data scientists but to try out different approaches more common to software development.
Be notified about next articles from jerome basdevant
CTO and cofounder at Datamaran
Connect and Learn with the Best Eng Leaders
We will send you a weekly newsletter with new mentors, circles, peer groups, content, webinars,bounties and free events.