Google's 1,302 case studies prove almost nothing

Google’s 1,302 GenAI Case Studies Are a Map, Not a Mandate

Google just expanded its public catalogue of real-world generative AI deployments to 1,302 entries, featuring names like Accenture, Deloitte, BMW, Mercedes-Benz, Bayer, and dozens of Fortune 500 operators. On the surface this looks like validation that GenAI has crossed the chasm from experiment to infrastructure. The honest read is more complicated. What you are actually looking at is a curated list of vendor-friendly wins, most of them narrow, many of them still operating with significant human supervision underneath the marketing copy.

The number itself tells you something useful. A year ago the same catalogue sat at around 100 production references. The growth is real, and the spread of use cases - internal copilots, document summarisation, customer service triage, code assistance, marketing content generation, supply chain forecasting - reflects where GenAI genuinely earns its keep. But the catalogue is also a sales artefact. Google publishes it to drive Vertex AI and Gemini adoption. Every entry has been through legal, comms, and partner marketing before it shows up. None of them are going to lead with the failure rate, the human review hours, or the prompt regression incidents.

For anyone building or sponsoring AI work, the right way to use this list is as a pattern library, not a permission slip. The companies winning here are not winning because they picked the right model. They are winning because they treated GenAI as a system to engineer around, with constraints, validation, and clear ownership. The companies that will fail in the next eighteen months are the ones that read this catalogue, see BMW shipping a voice assistant, and assume their organisation can do the same by next quarter with a workshop and a Vertex subscription.

What’s Actually Going On

Strip away the press release language and the patterns inside the 1,302 entries are remarkably consistent. The vast majority cluster into four operational shapes: retrieval-augmented question answering over internal documents, structured extraction from unstructured inputs, drafting assistance with human approval gates, and customer-facing conversational interfaces with tight scope. Almost nothing in the catalogue is a fully autonomous agent making consequential decisions without a human in the loop. That is not an accident. It reflects what actually works in production right now.

The successful deployments share an architecture, not a model choice. They wrap a probabilistic component - the LLM - inside a deterministic envelope. Inputs are validated and shaped before they reach the model. Outputs are constrained through schema, function calling, or grounding against a controlled corpus. Every meaningful response is either reviewed, scored, or auditable. The companies named in the catalogue have invested heavily in the boring parts: evaluation harnesses, prompt versioning, retrieval pipelines, observability for token usage and latency, and rollback mechanisms when a model update changes behaviour. The model is the smallest part of the stack.

The other thing happening underneath the catalogue is a quiet shift in what GenAI is actually being used for. Two years ago the conversation was about replacement - agents that would do the work of analysts, support reps, developers. The deployments that survived contact with reality are almost all augmentation: tools that compress the time a human spends on a task by 30 to 70 percent while keeping the human accountable. BMW’s voice assistant is not replacing service advisors. Deloitte’s audit tooling is not replacing auditors. Accenture’s internal copilots are not replacing consultants. They are removing friction from specific steps inside workflows that still belong to people. That distinction matters because it determines what you build, who owns it, and how you measure whether it worked.

Where People Get It Wrong

The most common mistake leaders make when they read a catalogue like this is treating the case study as a recipe. The summary says BMW deployed a Gemini-powered voice assistant and customer satisfaction improved. What it does not say is that the project took eighteen months, ran through three architectural rewrites, required a custom evaluation framework for automotive-domain hallucinations, and depended on a data engineering investment that predated the GenAI work by years. Copying the outcome without copying the foundation produces demos that never reach production, or worse, production systems that fail loudly the first time a customer asks something the team did not anticipate.

The second mistake is overreaching on agent architectures. Reading about enterprise deployments tends to push teams toward complex multi-agent systems with planners, executors, critics, and tool-use loops. In practice almost every reliable production system in the Google catalogue is a pipeline, not an agent swarm. A pipeline has defined stages, predictable cost, debuggable failure modes, and clear ownership. An agent system has emergent behaviour, unbounded token consumption, and a debugging surface that grows with every tool you add. If a deterministic pipeline solves the problem, building an agent on top of it is not sophistication, it is technical debt with better marketing. The teams shipping useful things are starting with the simplest possible structure and only adding agency when they can prove the simpler approach cannot do the job.

The third mistake is underinvesting in evaluation and treating GenAI features as one-off launches. Models change. Prompts drift. Retrieval corpora go stale. A system that worked at 92 percent accuracy in March can degrade to 78 percent by September without a single line of code changing, because the upstream model was updated or because the underlying data shifted. The companies in the catalogue that are still in production a year later all have continuous evaluation, regression suites tied to representative inputs, and a process for catching drift before users do. The companies that treat GenAI like a website launch - ship it, hand it to operations, move on - are the ones quietly pulling features six months later and not writing case studies about it. Production GenAI is closer to running a search engine than shipping a feature: it requires ongoing tuning, not a finish line.

Google's 1,302 case studies prove almost nothing

Google’s 1,302 GenAI Case Studies Are a Map, Not a Mandate

What’s Actually Going On

Where People Get It Wrong

Keep Reading

Meta cut 8,000 jobs to fund GPUs

Ransomware ships a wiper

Your hosting panel is your attack surface

Stay in the loop