Ars Technica2 OpenAI has teamed with Cerebras to run its Codex-Spark coding model on the Wafer Scale Engine 3, a chip the size of a dinner plate. The partnership aims to improve inference speed, delivering roughly 1,000 tokens per second, with higher rates reported on other models. The move reflects OpenAI’s broader strategy to reduce reliance on Nvidia by striking deals with AMD, Amazon and developing its own custom silicon. The faster coding assistant arrives amid fierce competition from Anthropic, Google and other AI firms, underscoring the importance of latency for developers building software.
Read more →