Google bakes computer-use agents into Gemini 3.5 Flash, adds prompt-injection guards
Google has folded its computer-use capability directly into Gemini 3.5 Flash, moving it out of the standalone Gemini 2.5 model and into the mainline Flash release. The feature lets developers build agents that perceive a screen, reason about it, and act across browser, mobile, and desktop environments — clicking, typing, and navigating real applications rather than just calling APIs. Google is pitching it at long-horizon enterprise automation such as continuous software testing and knowledge work spanning professional tools, with access through the Gemini API and the Gemini Enterprise Agent Platform.
The more notable angle is security. Agents that operate live UIs are exposed to indirect prompt injection, where malicious instructions hidden in on-screen content hijack the agent’s behavior. Google says it applied targeted adversarial training to harden the model, and is shipping two optional enterprise safeguards: one that forces explicit user confirmation before sensitive or irreversible actions, and another that halts a task when an injection attempt is detected. It frames these as one layer in a defense-in-depth setup, urging developers to add sandboxing, human-in-the-loop checks, and tight access controls on top.
The practical takeaway is that autonomous UI-driving agents are becoming a default model feature rather than a specialized add-on, which lowers the barrier to building them — and correspondingly raises the stakes on the injection and over-permissioning risks that come with letting a model take real actions. The built-in safeguards acknowledge that the threat model is now Google’s problem to mitigate at the platform level, not just the developer’s.
Read the full article
Continue reading at Hacker News →This is an AI-generated summary. Read the original for the full story.