Recent benchmark testing of AI agents on professional tasks shows a notable jump in performance, especially after Anthropic released Opus 4.6. The new model pushed scores from the low‑20s to just under 30 percent on one‑shot trials and reached an average of 45 percent with multiple attempts. While still far from full competence, the improvement signals rapid progress in foundation models and suggests that legal professionals may need to reconsider the timeline for AI displacement.
Read more →