OpenAI introduces EVMbench to test AI agents on smart-contract security
EVMbench evaluates whether AI agents can find, patch, and exploit smart-contract bugs in a controlled blockchain sandbox with reproducible grading.
Quick answer
It is a standardized test for “AI security auditors.” The model reads contract code, fixes it without breaking tests, or demonstrates an exploit on a local blockchain setup.
What happened
On February 18, 2026, OpenAI (with Paradigm) released EVMbench, a benchmark that scores agents in detect, patch, and exploit modes on smart-contract vulnerability tasks.
Why it matters
Smart contracts move real money. Stronger AI agents could help defenders audit code, but they could also lower the barrier for attackers. Benchmarks like EVMbench help measure both progress and risk.
Key points
- Runs three modes: detect vulnerabilities, patch code while preserving behavior, and exploit a deployed contract in a sandbox.
- Uses automated tests and chain-state checks to grade outcomes programmatically.
- Publishes tasks and tooling to make results reproducible across models.
What to watch
Watch whether gains in “exploit” capability are matched by reliable detect-and-patch coverage that makes AI meaningfully safer for day-to-day audits.
Key terms
- Smart contract
- A program that runs on a blockchain and can hold or move assets automatically.
- EVM
- The Ethereum Virtual Machine, the standard execution environment for many smart contracts.
Sources
- Introducing EVMbench OpenAI · Primary announcement · Feb 18, 2026 Primary
- EVMbench: Evaluating AI Agents on Smart Contract Security OpenAI · Paper · Feb 18, 2026 Primary