DeepMind puts controls around AI agents

Google DeepMind is turning agent safety into a monitored control system, not just a model alignment claim.

Google DeepMind published an agent security framework on June 18, 2026, called the AI Control Roadmap. The central fact is specific: the lab says it is treating advanced internal AI agents as potentially imperfect systems, even when they have been trained to be aligned, and is adding an operational control layer around their actions. This is not a new model launch. It is a deployment doctrine for agents that can carry out longer tasks in code, cybersecurity, research and product workflows.

The distinction matters because AI safety is often framed mainly as a model problem: make the system more helpful, less dangerous and better aligned. DeepMind is adding a layer that looks closer to traditional computer security. Its framework starts from a cautious assumption: if a highly capable agent behaved like an insider threat, how would the organization limit the damage? The company says it builds on MITRE ATT&CK, a cybersecurity knowledge base used to classify attacker tactics, and adapts that logic to agents. The goal is not to predict every possible failure, but to break risks into observable behaviors.

The roadmap describes three broad responses. The first is detection: other, more trusted AI systems can supervise a working agent’s reasoning, plans and actions. The second is prevention and response: if a supervisor sees a risky action, it can block it before execution or escalate it for human review. The third is measurement: DeepMind says it wants to track monitoring coverage, recall of problematic behavior and time to response. Those metrics matter because they turn a general promise of caution into something closer to an engineering target, even if the source does not publish every internal threshold.

The most concrete number is one million coding agent trajectories already analyzed by an internal prototype for asynchronous monitoring. DeepMind says this data is helping refine its protocols and build a live monitor for the Gemini Spark agent, including protection against incidents such as unintentional data deletion. The company also notes that most flagged events do not come from adversarial intent, but from agent misinterpretation or overeagerness to complete a user’s goal. That is the useful signal: as agents receive more permissions inside enterprise tools, their safety will depend as much on access rights, logs and real-time brakes as on the underlying model itself.