What is new on Article Factory and latest in generative AI world

May 6, 2026

Google's Gemma 4 gains speed boost with Multi-Token Prediction drafters

Ars Technica2

Google has introduced Multi-Token Prediction (MTP) drafters for its Gemma 4 open models, promising up to a two‑fold reduction in response time for locally run AI. The experimental feature uses speculative decoding to guess future tokens, allowing a lightweight draft model to fill idle processing cycles. Built on the same architecture as Gemini, Gemma 4 can run on a single high‑power accelerator or, when quantized, on consumer‑grade GPUs. A shift to an Apache 2.0 license also makes the models more permissive, encouraging broader adoption of edge AI. Read more →