Image by Author

Today, machine learning and artificial intelligence systems, trained by data, have become so effective that many of the largest and most well-respected companies in the world use them almost exclusively to make mission-critical business decisions. The outcome of a loan, insurance or job application, or the detection of fraudulent activity is now determined using processes that involve no human involvement whatsoever.

In a past life, I worked on machine learning infrastructure at Uber. From estimating ETAs to dynamic pricing and even matching riders with drivers, Uber relies on machine learning and artificial intelligence to enhance customer happiness and increase driver…


Originally published at https://towardsdatascience.com on March 4, 2021

As Machine Learning infrastructure has matured, the need for model monitoring has surged. Unfortunately this growing demand has not led to a foolproof playbook that explains to teams how to measure their model’s performance.

Performance analysis of production models can be complex, and every situation comes with its own set of challenges. Unfortunately, not every model application scenario has an obvious path to measuring performance like the toy problems that are taught in school.

In this piece we will cover a number of challenges connected to availability of ground truth and discuss…


Staples Button, But Make It Machine Learning, Image by Author

Originally published at https://towardsdatascience.com on February 22, 2021.

In our last post we took a broad look at model observability and the role it serves in the machine learning workflow. In particular, we discussed the promise of model observability & model monitoring tools in detecting, diagnosing, and explaining regressions models that have been deployed to production.

This leads us to a natural question of: what should I monitor in production? The answer, of course, depends on what can go wrong.

In this article we will be providing some more concrete examples of potential failure modes along with the most common…


The ML Observability platform allows teams to analyze model degradation and to root cause any issues that arise. This ability to diagnose the root cause of a model’s issues, by connecting points across validation and production, is what differentiates model observability from traditional model monitoring. While model monitoring consists of setting up alerts on key model performance metrics such as accuracy, or drift, model observability implies a higher objective of getting to the bottom of any regressions in performance or anomalous behavior. We are interested in the why. Monitoring is interested in only aggregates and alerts. …


Statistical Distances are used to quantify the distance between two distributions and are extremely useful in ML observability. This blog post will go into statistical distance measures and how they are used to detect common machine learning model failure modes.

Data problems in Machine Learning can come in a wide varietythat range from sudden data pipeline failures to long-term drift in feature inputs. Statistical distance measures give teams an indication of changes in the data affecting a model and insights for troubleshooting. …


Originally published at https://towardsdatascience.com on September 17, 2020.

Businesses in almost every industry are adopting Machine Learning (ML) technology. Businesses look towards ML Infrastructure platforms to help them best leverage artificial intelligence (AI).

Understanding the various platforms and offerings can be a challenge. The ML Infrastructure space is crowded, confusing, and complex. There are a number of platforms and tools, which each have a variety of functions across the model building workflow.

To understand the ML infrastructure ecosystem, we can broadly segment the machine learning workflow into three stages — data preparation, model building, and production. Data preparation refers to…


Machine Learning (ML) is being adopted by businesses in almost every industry. Many businesses are looking towards ML Infrastructure platforms to propel their movement of leveraging AI in their business. Understanding the various platforms and offerings can be a challenge. The ML Infrastructure space is crowded, confusing, and complex. There are a number of platforms and tools spanning a variety of functions across the model building workflow.

To understand the ecosystem, we broadly segment the machine learning workflow into three stages — data preparation, model building, and production. …


Originally published in Towards Data Science on May 9, 2020

Artificial Intelligence (AI) and Machine Learning (ML) are being adopted by businesses in almost every industry. Many businesses are looking towards ML Infrastructure platforms to propel their movement of leveraging AI in their business. Understanding the various platforms and offerings can be a challenge. The ML Infrastructure space is crowded, confusing, and complex. There are a number of platforms and tools spanning a variety of functions across the model building workflow.

To understand the ecosystem, we broadly break up the machine learning workflow into three stages — data preparation, model…


Originally published on Towards Data Science

How to build Resilience in Production AI/ML during Outlier Events & Extreme Environments

Coronavirus is the black swan of 2020. Not only is the initial on-set of the virus an unexpected extreme outlier event, the human reaction to try to contain the virus is creating massive ripples through systems that run the world — health, business, finance, gig-economy, credit, commerce, auto-traffic and travel to name a few.

Black Swan events pose particular challenges for machine learning (ML) models. ML models are trained on previously seen observations to predict future scenarios. However, today these models are seeing events that are drastically different from what they were ever trained…


Artificial Intelligence (AI) and Machine Learning (ML) are being adopted by businesses in almost every industry. Many businesses are looking towards ML Infrastructure platforms to propel their movement of leveraging AI in their business. Understanding the various platforms and offerings can be a challenge. The ML Infrastructure space is crowded, confusing, and complex. There are a number of platforms and tools spanning a variety of functions across the model building workflow.

ML Workflow Stages Diagram by Author

To understand the ecosystem, we broadly break up the machine learning workflow into three stages — data preparation, model building, and production. Understanding…

Arize AI

Arize AI is an ML Observability Platform. We monitor, explain, troubleshoot, and improve machine learning models.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store