Introduction
In the world of hyperscale data centers, even a tiny performance slip can waste enormous amounts of electricity. Meta’s Capacity Efficiency Program has tackled this by building an AI agent platform that automates both the detection and resolution of performance issues across its infrastructure. By encoding the hard-won expertise of senior efficiency engineers into reusable, composable skills, these agents now recover hundreds of megawatts of power and shrink what used to be hours of manual investigation into just minutes. This frees engineers from firefighting and lets them focus on innovation.

The Challenge of Hyperscale Efficiency
When your applications serve more than 3 billion people, a 0.1% performance regression can cascade into a massive energy drain. Meta’s capacity team has long run two complementary efforts to keep power consumption under control:
- Offense: Proactively hunting for code changes that can make existing systems more efficient, then deploying those optimizations across the fleet.
- Defense: Continuously monitoring production resource usage to catch regressions, trace them back to a particular pull request, and roll out mitigations.
These processes worked, but they created a new bottleneck: human engineering time. The more issues surfaced, the more engineers were needed to investigate and fix them.
Enter the Unified AI Agent Platform
Meta’s answer was to build a unified AI agent platform that standardizes tool interfaces and bakes domain expertise into automated skills. Instead of relying on human intuition for each step, the platform can:
- Automate the entire investigation pipeline from symptom to root cause.
- Generate ready-to-review pull requests from identified efficiencies.
- Reduce a typical 10-hour manual regression analysis to about 30 minutes.
Standardized Tool Interfaces and Encoded Expertise
The magic lies in combining standardized APIs with encoded domain knowledge. Every efficiency agent uses the same underlying data sources and actions, making it easy to compose skills for different scenarios. Senior engineers contribute their investigation patterns, which the platform then turns into reusable modules that any agent can call upon.
Defense in Action: FBDetect and Automated Regression Response
On the defense side, Meta’s in-house tool FBDetect catches thousands of regressions every week. Previously, each one demanded human attention to diagnose and revert. Now, AI agents can automatically:

- Ingest the regression alert from FBDetect.
- Run encoded diagnostic checks to pinpoint the offending code change.
- Create a mitigation plan or revert pull request for engineer review.
The faster these regressions are resolved, the fewer megawatts are wasted while a problem compounds across the fleet. This self-accelerating loop is key to keeping power delivery on track.
Offense: Proactive Optimization at Scale
On the offense side, AI-assisted opportunity resolution is expanding to more product areas every half. The platform scans codebases and runtime data for efficiency improvements that manual processes would never have time to chase. Each automated win delivers measurable power savings, and the platform stacks these wins across hundreds of product teams.
Measurable Results: Power Savings and Engineer Productivity
Together, offense and defense have already recovered hundreds of megawatts of power—enough to supply hundreds of thousands of American homes for a year. More importantly, the program can keep growing its megawatt delivery without proportionally growing the headcount. Engineers spend less time on repetitive investigations and more time designing next-generation hardware and software.
The Road Ahead: Toward a Self-Sustaining Efficiency Engine
Meta’s ultimate vision is a self-sustaining efficiency engine where AI handles the long tail of performance issues autonomously. Future work includes training agents on even richer datasets, expanding to more infrastructure layers (storage, networking), and enabling cross-team collaboration via shared skill libraries. The goal is to make efficiency a built-in property of every system, not a manual afterthought.