Ars Technica2 Google has introduced Multi-Token Prediction (MTP) drafters for its Gemma 4 open models, promising up to a two‑fold reduction in response time for locally run AI. The experimental feature uses speculative decoding to guess future tokens, allowing a lightweight draft model to fill idle processing cycles. Built on the same architecture as Gemini, Gemma 4 can run on a single high‑power accelerator or, when quantized, on consumer‑grade GPUs. A shift to an Apache 2.0 license also makes the models more permissive, encouraging broader adoption of edge AI.
Read more →