OpenAI introduced a new benchmark called GDPval that pits its AI models against human experts across dozens of occupations. In the initial rollout, GPT-5‑high was judged better than or on par with professionals in about 40.6% of tasks, while Anthropic’s Claude Opus 4.1 achieved roughly a 49% win rate. The test covered 44 roles spanning key sectors such as healthcare, finance, and manufacturing. OpenAI says the results show AI can start offloading routine work for many jobs, though it acknowledges the current scope is limited and plans to expand the benchmark’s coverage.
Leia mais →