Traditional monitoring falls short for AI in production — what are you using instead?

Hey folks — Curious how others here are handling observability for ML systems in production. We’ve been hearing a lot of stories lately about teams getting surprised by things like data drift, hidden bias, or weird model behavior — not because no one was watching, but because the usual tools just didn’t show the full picture.

From what we’ve seen, traditional monitoring (like Prometheus, Datadog, etc.) is great for infra, but doesn’t go deep enough into what the model is actually doing.

We’re thinking a lot about stuff like:

  • How do you track model behavior over time (beyond just output metrics)?
  • Are you monitoring inputs, drift, feature attribution, etc.?
  • What have you tried that worked — or didn’t?

Are you building your own tools, using open source (e.g. Evidently, Truera, etc.), or just hacking things together?

We put together some thoughts on where the gaps are — happy to share if helpful, but mostly just want to hear how other folks are approaching this.

What’s your current setup look like?

2 Likes