Back

OpenAI Claims GPT-5 Nears Human Performance on New GDPval Benchmark

OpenAI Claims GPT-5 Nears Human Performance on New GDPval Benchmark
TechCrunch

OpenAI Launches GDPval Benchmark to Measure AI Against Human Professionals

OpenAI announced a new benchmark named GDPval, designed to compare the output of its AI models with that of seasoned professionals across a wide range of industries and occupations. The benchmark focuses on sectors that contribute heavily to the U.S. economy, including healthcare, finance, manufacturing, and government, and evaluates performance in forty‑four distinct jobs.

For the first version, dubbed GDPval‑v0, OpenAI asked experienced workers to review AI‑generated reports alongside human‑generated ones and choose the better piece. The model’s “win rate” represents the percentage of times its work is judged equal to or superior to the human baseline across all occupations.

Results Show GPT‑5‑high and Claude Opus Making Strides

In the initial run, OpenAI’s GPT‑5‑high model, a more powerful variant of GPT‑5, was judged better than or on par with experts in about 40.6% of the tasks. Anthropic’s Claude Opus 4.1 performed slightly higher, achieving a win rate near 49%. By contrast, OpenAI’s earlier GPT‑4o model scored roughly 13.7%.

OpenAI noted that Claude’s strong showing may stem from its ability to produce pleasing graphics rather than pure performance, but both models demonstrate notable progress compared to earlier releases.

Implications for the Workforce

The company frames the benchmark as evidence that AI systems are becoming capable enough to assist professionals in routine aspects of their work, potentially freeing up time for higher‑value activities. OpenAI’s chief economist highlighted that as models improve, workers can offload more tasks to AI, enhancing productivity across sectors.

Nevertheless, OpenAI cautions that GDPval‑v0 tests a limited set of tasks and does not capture the full complexity of many jobs. The firm plans to broaden the benchmark to cover more interactive workflows and a wider array of occupations.

Industry Perspective

Analysts see the GDPval results as a step toward more realistic assessments of AI’s economic impact. While the benchmark’s current scope is narrow, it offers a concrete way to gauge progress toward artificial general intelligence, a core goal of OpenAI’s mission.

Future iterations of GDPval are expected to incorporate additional industries and more comprehensive task sets, providing deeper insight into how AI can complement – rather than replace – human expertise.

Used: News Factory APP - news discovery and automation - ChatGPT for Business

Source: TechCrunch

Also available in: