PrismML's Bonsai Image 4B Squeezes Diffusion Models to Run on iPhones

PrismML has released Bonsai Image 4B, a pair of heavily quantized diffusion models derived from FLUX.2 Klein 4B that fit within the memory budgets of phones and laptops. The 1-bit variant uses binary {-1, +1} transformer weights with FP16 group scaling for an effective 1.125 bits per weight, shrinking the transformer to 0.93 GB. The ternary variant keeps a zero state for 1.71 effective bits per weight at 1.21 GB, trading some compression for better prompt fidelity and visual quality.

The practical payoff is deployment reach. Full-precision FLUX.2 Klein 4B requires roughly 16 GB and won’t fit on an iPhone 17 Pro Max, while Bonsai’s payloads of 3.42 GB and 3.88 GB run on-device, generating a 512x512 image in about 9.4 seconds on the iPhone and 6 seconds on an M4 Pro Mac. On benchmarks (GenEval, HPSv3, DPG-Bench), the ternary model retains 95% of FLUX.2 Klein 4B’s accuracy and the 1-bit model retains 88%, while substantially beating other models in their memory class.

Both variants ship with open weights under Apache 2.0, alongside an iOS app called Bonsai Studio. The release matters less as a benchmark win than as a shift in where iterative image generation can live: locally, without per-prompt server costs, round-trip latency, or sending user prompts to a remote API.