Liquid AI ships LFM2.5-8B-A1B, an edge MoE built for on-device agents

Liquid AI released LFM2.5-8B-A1B, a mixture-of-experts model with 8B total and 1B active parameters aimed at running agentic workloads locally on laptops and phones. The update extends the context window from 32K to 128K tokens, doubles the vocabulary to 128K for better non-Latin tokenization, and pushes pretraining from 12T to 38T tokens. Unlike its predecessor, it is reasoning-only, emitting an explicit chain of thought before answers — a tradeoff the team argues is cheap on MoE architectures because each token activates few parameters.

Training notable points include in-place tokenizer expansion (avoiding a full retrain), staged RoPE-based context extension, and targeted RL passes to suppress two known failure modes: doom loops in long reasoning traces and confident hallucination from limited parametric knowledge. The latter uses an avg@k reward to sharpen the model’s abstention boundary on questions outside its reliable knowledge.

Liquid claims best-in-class throughput for the size, citing 253 tokens/s on an M5 Max, ~30 tokens/s on a phone under 6GB RAM, and 18.5K output tokens/s on an H100 via SGLang. Base and post-trained weights are on Hugging Face with day-one support across llama.cpp, MLX, vLLM, SGLang, and ONNX. The pitch is fully local agents — their LocalCowork demo drives 67 tools across 13 MCP servers on one laptop with no cloud calls.