When AI Features Are Market-Ready

Director, System Software, AI for Gaming at NVIDIA

Problem

AI and deep learning are new fields veiled by a lot of confusion and misunderstanding. From common problems such as how one does road mapping and product planning, or how engineering estimates are made, to more specific ones distinct for AI and deep learning that focus on validation and verification of a model that is scheduled to go to production, its long-term support, and maintenance.

One of the biggest challenges in the AI/deep learning space -- given the very nature of its products -- is determining if a feature is ready to go to market and if the model’s performance meets real-time constraints. Those constraints vary from performance to image quality constraints but also include identifying reliable partners with whom we could launch.

Actions taken

Our guiding principle is to measure ourselves against the speed of light in terms of performance. We have a mathematical representation of how fast a feature can run on a piece of given hardware, and we measure ourselves against that speed of light to see how close we are to the optimum. Depending on how close we are, we decide whether we are ready to release or not.

When we assess the image quality, we look at different image quality measurements -- what an image looks like, how sharp or accurate it is, etc. Some image quality metrics that we used in the past were reliant on what we call a golden eyeball. That means that people would be merely looking at an image, assessing how good it looked. It wasn’t standardized or rigorous, and we would end up getting regressions. As a result, some of our models could look great in one piece of product but wouldn’t meet the expectations in the other. Back then, we went with that because we didn’t have the scope of testing to catch those differences, but we developed metrics and testing that helped us continuously improve over time.

Partners are a critical component to consider when determining if a feature is market-ready. Our field is such that one cannot complete a feature, announce that and hope that someone would use it. We need to have customers or partners on board prior to launching. We target partners relevant to the feature who have industry sway and wide acceptance. When we look at our feature being integrated into their product -- whether it is a game, software, or app -- we look if that is going to reflect well on us. It’s a two-way street -- will they get the quality and performance they expect, and will we get the recognition we deserve.

However, the hardest part about AI and deep learning is getting the data and building up a test environment necessary to validate AI. Capturing content that is representative enough and collected from different sources is difficult and time-consuming. Yet, many people are not taking data and testing as seriously as they do research and software engineering.

In traditional software development, all one needs to do is hire a few QA engineers who would write a test plan. They would click over here and touch over there; their main concern would be to establish if they type x here and y there, would they get an expected value. The expected value in AI is all grey. Most of our models are not clearly better than our previous models. We think of our models as better on average -- better in some areas and worse in others.

Lessons learned

As hundreds of companies are looking to develop the next generation AI and deep learning features, determining the quality and market readiness of those features becomes more critical. There is a lot of grey around the acceptable quality because this is a new field with many unknowns popping up behind every corner. The acceptable error rate is changing per project as the new developments are constantly pushing for new standards.
Validation and verification of models are often the most difficult part of the process, more difficult than creating models or the weights. Especially when your features are becoming more and more popular and are increasingly integrated into different areas and types of devices. The type of AI and deep learning that I work on, will be integrated into hundreds of applications in the midst of real-world content that is constantly changing. Therefore, having a rigorous, reproducible, and agreed-upon methodology for validating features across a wide spectrum is exceedingly important.
It takes a lot of thinking, effort, and infrastructure -- from writing software to capturing data for validation -- to transition features from research to production. That included dedicated staff because traditional QA testers are not of much use. Their onboarding and training will take a significant amount of time. For example, training them to understand what is a game-side issue and what is a model issue will take so much time and won’t be worth the effort in a constantly changing environment.

Be notified about next articles from Jason Mawdsley

Jason Mawdsley

Director, System Software, AI for Gaming at NVIDIA

Performance Metrics

We will send you a weekly newsletter with new mentors, circles, peer groups, content, webinars,bounties and free events.

When AI Features Are Market-Ready

Problem

Actions taken

Lessons learned

Be notified about next articles from Jason Mawdsley

Connect and Learn with the Best Eng Leaders