Byte-Sized Design

Byte-Sized Design

How Datadog taught an AI to investigate high-severity incidents

Byte-Sized Design's avatar
Byte-Sized Design
Jan 20, 2026
∙ Paid

Most incident tools are good at collecting evidence.

They’re bad at thinking with it.

If you’ve ever been on call, you know the feeling:

  • 12 dashboards open

  • Logs screaming

  • Traces half-useful

  • And one suspicious metric you can’t ignore

The hard part isn’t access to data.
It’s deciding what to look at next.

That’s the problem Bits AI SRE is actually trying to solve.


This isn’t an AI summarizer (and that matters)

The early wave of “AI for ops” tools made a quiet assumption:

If we gather enough telemetry, the model can summarize its way to the root cause.

That turns out to be wrong.

More data doesn’t make incidents clearer.
It makes them noisier.

Bits AI SRE does something different.
It investigates like a team of human SREs:

  • Form a hypothesis

  • Pull targeted evidence

  • Validate or reject

  • Go deeper only when the signal earns it

That sounds obvious.
It isn’t.

Most tools still dump everything into context and hope the model figures it out.


The key shift: causality over correlation

Here’s the most important design decision in this system:

The agent only looks at data that is causally related to a hypothesis.

Not “everything nearby.”
Not “everything noisy.”
Not “everything interesting.”

Just:

Does this explain why the alert fired?

In one real incident:

  • Kafka lag spiked

  • Commit latency spiked

  • Unrelated upstream errors were present

Earlier versions of the agent saw all of it
…and picked the wrong root cause.

The newer version ignored the noise and followed the causal chain:
commit latency → consumer lag → alert

That’s not an LLM trick.
That’s system design discipline.


Why benchmarking on real incidents is the quiet superpower

User's avatar

Continue reading this post for free, courtesy of Byte-Sized Design.

Or purchase a paid subscription.
© 2026 Byte-Sized Design · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture