AI Research

OpenAI introduces EVMbench to test AI agents on smart-contract security

EVMbench evaluates whether AI agents can find, patch, and exploit smart-contract bugs in a controlled blockchain sandbox with reproducible grading.

Difficulty: Medium
Read time: 1 min
Published: May 3, 2026 · 8:30 AM
Sources: 2

Quick answer

It is a standardized test for “AI security auditors.” The model reads contract code, fixes it without breaking tests, or demonstrates an exploit on a local blockchain setup.

What happened

On February 18, 2026, OpenAI (with Paradigm) released EVMbench, a benchmark that scores agents in detect, patch, and exploit modes on smart-contract vulnerability tasks.

Why it matters

Smart contracts move real money. Stronger AI agents could help defenders audit code, but they could also lower the barrier for attackers. Benchmarks like EVMbench help measure both progress and risk.

Key points

Runs three modes: detect vulnerabilities, patch code while preserving behavior, and exploit a deployed contract in a sandbox.
Uses automated tests and chain-state checks to grade outcomes programmatically.
Publishes tasks and tooling to make results reproducible across models.

What to watch

Watch whether gains in “exploit” capability are matched by reliable detect-and-patch coverage that makes AI meaningfully safer for day-to-day audits.

Key terms

Smart contract: A program that runs on a blockchain and can hold or move assets automatically.
EVM: The Ethereum Virtual Machine, the standard execution environment for many smart contracts.

Sources

Introducing EVMbench OpenAI · Primary announcement · Feb 18, 2026 Primary
EVMbench: Evaluating AI Agents on Smart Contract Security OpenAI · Paper · Feb 18, 2026 Primary