Chrome's Prompt API: Gemini Nano Runs Locally in the Browser

Chrome ships a built-in Prompt API that exposes Gemini Nano to web pages and extensions, letting developers send natural language requests without round-tripping to a cloud endpoint. The model is downloaded on first use per origin, with availability gated behind hefty hardware requirements: 22 GB free disk, more than 4 GB VRAM or 16 GB RAM with 4+ cores, and desktop-only support across Windows 10/11, macOS 13+, Linux, and Chromebook Plus. Audio input demands a GPU outright.

The API surface centers on LanguageModel.availability() and create(), with sessions configurable via temperature, topK, AbortSignal cancellation, and initialPrompts that seed system/user/assistant turns. An assistant-role message with prefix: true lets developers force a specific output shape, useful for steering the model into TOML, JSON, or other structured formats. Multimodal inputs cover text, image, and audio; outputs remain text-only, with English, Japanese, and Spanish currently supported.

The significance is the shift of generative inference into the browser runtime as a first-class platform primitive. On-device execution sidesteps API costs, network latency, and data egress, but the storage and VRAM floors mean any production feature relying on it needs a server fallback path. Extensions get extra knobs (params() exposing model defaults) that page contexts do not.