Important Lessons In Building Machine Learning
Head of Software Application at Groq
At Target, I was leading a team of data scientists and engineers in a company that was not historically amazing with technology, and we were tasked with bringing machine learning into many areas of the business, most for the first time.
Even though I had earned a Ph.D. in Machine Learning in 2008 and had many years of business experience before that, I wasn’t 100% prepared to manage a team that was building and deploying machine learning technology in a Fortune 50 retail company. The field of machine learning was moving fast and the company, well, it was moving a little slower sometimes despite its great history in retail and really great senior management.
Our data scientists, super smart people, often with PhDs in machine learning, always wanted to do perfect work. They wanted to show up and display their intelligence and hard-earned knowledge. A typical example of this urge appeared in the form of a slide showing the latest neural network architecture. This turns out to be a bad move. It doesn’t build the confidence that everyone needs. Worse, it skips many important and difficult tasks that require building a strong partnership with the business.
Some of our internal business partners were familiar with machine learning, some others were the opposite. We had still other partners that thought they knew everything they needed to know and were just interested in how many heads they could get assigned to their projects. All of these partners were smart and needed our help even though they might not agree on exactly how that help should arrive.
Most of our business partners were under pressure to “use data better” and make “data-driven decisions”. In more than one case, this would lead them to call project kickoff meetings with 40 people and follow up meetings that would feel like the movie Groundhog Day. I had team members who begged to be taken off projects, simply because there was not any “real work” coming out of the meetings. Awkward.
These situations, and others, made me urgently seek ways to get our team to deliver more valuable outcomes for the company without sending our business partners to a Ph.D. program or our data scientists off to get an MBA.
Here’s how we did it. We broke the planning discussions down into different lanes. For any project, four different questions need to be answered. For each question, there’s a Machine Learning version and a Data Engineering version. Eight lanes now, with small overlap. Already some simplification. Curious what they are? The Machine Learning side of things was critical to get right first, so I will just talk about those. Here goes:
Question One "What are we building?"
Of course, there are more specific and technical ways to ask that, but in a general sense, this is the core question. A finished model is just software. Data goes in, the answer comes out. What does it do after all the data has been crunched and it’s in the users’ hands? What exactly?
Everybody can understand the general basis of that question. But getting everybody to agree actually takes time and adds clarity to the project.
At an early stage of developing this process, I asked this question to a senior business partner, a senior data science leader, and a junior data scientist and I got three very different answers. And it wasn’t easy to get any answer at all. This question was significantly harder for some to understand, much less provide an answer. I had to get everybody on the same page.
On to Question Two: What data sources are we going to use to build the model?
For example, when building a search engine, there are many data sources you could pull from to create the search engine. The ideal outcome relies on the quality of the data and its relevance to building good models. Acquiring that data is a partnership between business and the modelers. That discussion needs to start on Day 0.
Partners often have great intuition about what data is relevant and why. They are also a key source of institutional knowledge and prior knowledge for data scientists. Not all bad model outputs are from “bad models”. Often there’s an opportunity to improve upstream data quality. When partners understand this, another barrier to success has fallen.
Now, we are ready for Question Three: "How do we improve the model?"
There are several aspects to this one it appears:
- How do we know that the new model is better than the old one?
- When do we say our model is out of date and decide to update it?
- What are the downstream impacts for other models that may depend on this one?
Those who work in machine learning won’t be surprised to learn that the easier answers to these questions happen only when you have good model deployment technology, you can do A/B testing easily, and your code base is not a plate of spaghetti that has hardwired the model into place. All this adds up to establishing modern software engineering practices around building and deploying models. It’s hard, but doing it right builds trust. It’s actually more important than having a perfect model right away. It also helps your engineering partners focus more on the rest of the system, knowing that models will improve over time and that we are working together on that.
Finally, we can talk about our favorite, Question Four. What is the model architecture going to be?
When we engage proactively on the first three questions, we can buy the freedom to build the model the best way possible and change it when we discover a better way. And we will. Another good reason not to show our partners that architecture slide up front!
If you nail the first three questions, then you have the freedom to use whatever model architecture you would like to solve the problem. The business realizes the strengths and challenges and transparently sees you addressing them without having to prescribe the solution. You are partners on data quality. You are much more ready to collaborate for a successful outcome.
Bridging the gap between data scientists and their business partners turned out to require actions from me and I responded by organizing how we communicate when collaborating with other parts of the company. I used it as an antidote for the confusion and noise that exists in many companies seeking “Transformation”. It helped us communicate with all the different parties involved in building the machine learning technology helping in many parts of the business. In my current role, I have seen this approach help other organizations and have friends and colleagues who have adapted these 4 questions to projects in large and small organizations.
Nowadays I even give speeches covering this topic. I call it “Product Thinking for Machine Learning in the Enterprise”. It’s an easy talk to give because I am really just repeating the coaching I gave dozens of times on the job.
Fair warning: Even though it seemed obvious and necessary to me, that doesn’t mean I got automatic buy-in from everyone else. So, I tried to build habits with the team. If I was doing a product review and going over OKRs, and I was asking these four questions in the context of an OKR, things would be immensely more productive. If the team is prepared it really speeds things up. Two-hour “spin sessions “can become 20-minute updates. When you have 25 products to review, everyone is happier when it can be done more quickly.
When you are breaking down the problem from a big ball of mud into two groups of four questions you are not just dividing by eight. You are also making each of those questions easier to answer by reducing the conflation and cross chatter that happens in big groups. You are also establishing some consistency that you can take across many products, making it easier to scale your scope. Most importantly, by attacking those sub-problems, you are speeding up development tremendously.
Connect and Learn with the Best Eng Leaders
We will send you a weekly newsletter with new mentors, circles, peer groups, content, webinars,bounties and free events.