MongoDB Taught Its Database to See the Future
How Atlas built a predictive auto-scaler that scales before your servers break a sweat
TLDR
Reactive auto-scaling has a fundamental problem: by the time it kicks in, you’re already overloaded. The scaling operation itself takes several minutes. You’ve spent real money on degraded performance and user experience while your scaler caught up.
MongoDB’s engineering team decided to fix this. Not by scaling faster. By scaling earlier.
The result was a production predictive auto-scaler for MongoDB Atlas that saves customers an average of 9 cents per hour, per replica set. At Atlas’s scale, that’s millions of dollars a year.
Here’s how they built it.
The Problem with Reactive Scaling
Atlas’s old auto-scaler worked like most do. Wait for CPU utilization to exceed a threshold for a few minutes. Scale up one tier. Wait again. Scale up again if needed.
Two problems with this:
Problem 1: It’s slow by design. It doesn’t react instantly because scaling too frequently is expensive. But even a fast reaction still means minutes of overload before relief arrives.
Problem 2: It scales one tier at a time. If you’re on M40 and suddenly need M80 capacity, you go M40 → M50 → M60 → M70 → M80. Each step requires sustained underload or overload to trigger. Dramatic demand shifts can leave servers in the wrong state for a long time.
The only way to fundamentally improve this is to see the load spike before it arrives.
The Three-Component System
MongoDB’s predictive scaler has three pieces working together.
The Forecaster predicts future workload. But here’s a subtle design decision that matters: it doesn’t forecast CPU directly. It forecasts “customer-driven metrics”, queries per second, client connections, scanned objects per second. Metrics that aren’t affected by the scaling decisions themselves.
Why? If you forecast CPU and then scale to reduce it, you’ve eliminated the spike you were predicting. Your forecast is now wrong. Your model trains on incorrect data. The whole thing falls apart. By forecasting demand instead of utilization, the model stays honest.
The Estimator takes a forecasted demand level and any instance size, and outputs projected CPU utilization. It’s trained on 25 million data points sampled from across Atlas. Given “this replica set will receive X queries per second,” it can tell you whether an M40 will hit 90% CPU or an M50 will stay at 60%.
The Planner takes both and makes the tier decision. Cheapest instance that keeps CPU under 75% for the next 15 minutes. That’s it.
The Hard Part: Not Every Workload Is Predictable
25% of MongoDB Atlas replica sets have weekly seasonality. Around 56% have daily seasonality. Hourly? Rare, and not useful anyway — you can’t finish a scaling operation in under 15 minutes.
The Long-Term Forecaster uses MSTL (multi-seasonal trend decomposition) to separate each replica set’s history into trend, daily cycle, weekly cycle, and residuals. It’s trained on several weeks of data and predicts a few hours ahead. For seasonal workloads, it’s accurate.
For non-seasonal workloads, it’s useless. And MongoDB built in a “self-censoring” mechanism to handle this: the model continuously scores its own recent accuracy. If its predictions have been wrong, it stops trusting them. It knows when to shut up.
For those cases, there’s a Short-Term Forecaster. No seasonality required. Just look at the last hour or two of data and extrapolate the current trend. Simple. It beat the naive baseline (assume future = current) 68% of the time, with a 29% reduction in error.
Two models. One knows when to hand off to the other.
The Estimator’s Circular Dependency Problem
This one’s worth sitting with. The Estimator predicts CPU utilization for a given demand level on a given instance size. But MongoDB can’t see customer queries or data. They only have the aggregate metrics.
For 45% of replica sets, the Estimator achieves under 7% error. Good enough for precise scaling decisions. For another 42%, it’s less accurate but still useful for extreme cases, catching a replica set heading toward catastrophe. The remaining 13% get excluded from predictive scaling entirely. The system knows its own blind spots.
What It Actually Delivered
In the prototype experiment (tested against 10,000 replica sets, comparing simulated predictive scaler against the reactive scaler that was running at the time):
Stayed closer to the 50-75% CPU target range
Reduced both over- and under-utilization
9 cents saved per replica set per hour on average
The production version launched in November 2025. Conservative first release: it only scales up predictively. Scale-down still uses the reactive algorithm. They’ll extend it after the system proves itself in production.
The Engineering Principle Here
Reactive systems have a floor. They’re bounded by detection latency plus response time. You can tune detection, you can speed up provisioning, but you can’t get to zero lag without predicting the future.
The MongoDB team’s insight was that for many workloads, you can predict the future, not because the future is deterministic, but because most databases run on human schedules. Business hours. Batch jobs. Weekly reports. End-of-month processing.
We covered a similar pattern in OpenAI’s Postgres scaling, the biggest wins came from understanding the shape of the workload, not just throwing capacity at it. MongoDB’s approach is the same idea applied to provisioning itself.
Most systems wait for the signal. The better ones learn to anticipate it.
What to Take Back to Your System
The principles here apply beyond Atlas:
Forecast demand, not utilization. Utilization is downstream of your scaling decisions. If you forecast it directly, you create a feedback loop that invalidates your model. Forecast the thing your users control, not the thing you control.
Build self-censorship into predictive models. A model that knows when it’s wrong and hands off gracefully is far more useful than one that’s confidently incorrect. The Long-Term Forecaster’s accuracy-based confidence scoring is a pattern worth stealing.
Know your error rate by segment. MongoDB excluded the 13% of replica sets where the Estimator was too inaccurate. Not every system is predictable. Admitting that — and falling back cleanly — is better than applying a model everywhere and getting inconsistent results.
And if you want to think about how you’d measure whether your own infrastructure is predictable, the observability fundamentals we covered here are a reasonable starting point, specifically the section on metrics cardinality and what you’re actually measuring.
The reactive scaler isn’t going away. But for workloads with patterns, waiting for the alarm to fire is leaving performance and money on the table.
MongoDB proved you can do better. The math isn’t complicated. The hard part is being honest about when your predictions are trustworthy — and building a system that knows the difference.



