Traditional monitoring falls short for AI in production — what are you using instead?

erinlm31 · July 24, 2025, 1:03am

Hey folks — Curious how others here are handling observability for ML systems in production. We’ve been hearing a lot of stories lately about teams getting surprised by things like data drift, hidden bias, or weird model behavior — not because no one was watching, but because the usual tools just didn’t show the full picture.

From what we’ve seen, traditional monitoring (like Prometheus, Datadog, etc.) is great for infra, but doesn’t go deep enough into what the model is actually doing.

We’re thinking a lot about stuff like:

How do you track model behavior over time (beyond just output metrics)?
Are you monitoring inputs, drift, feature attribution, etc.?
What have you tried that worked — or didn’t?

Are you building your own tools, using open source (e.g. Evidently, Truera, etc.), or just hacking things together?

We put together some thoughts on where the gaps are — happy to share if helpful, but mostly just want to hear how other folks are approaching this.

What’s your current setup look like?

Topic		Replies	Views
Support for model monitoring? Inference Endpoints on the Hub	1	1003	September 29, 2022
Monitoring ML and LLM models in production for drift, trust, and safety Show and Tell	2	142	July 21, 2025
Best tool/method for AI model traceability management? Intermediate	0	34	October 14, 2024
We built traceAI, an open-source tool for tracing LLM calls in production Models	0	55	April 20, 2026
Optimal methods to monitor attention matrices when doing training/inference using BERT-type models Intermediate	2	877	September 11, 2021

Traditional monitoring falls short for AI in production — what are you using instead?

Related topics