May 3, 2026

Security agent benchmarks, faster video segmentation, and responsible AI practice

Today’s updates cover a new benchmark for AI security agents, a speed boost for video object tracking, and Google’s latest responsible AI progress report.

AI Research Medium

OpenAI introduces EVMbench to test AI agents on smart-contract security

EVMbench evaluates whether AI agents can find, patch, and exploit smart-contract bugs in a controlled blockchain sandbox with reproducible grading.

Why it matters

Smart contracts move real money. Stronger AI agents could help defenders audit code, but they could also lower the barrier for attackers. Benchmarks like EVMbench help measure both progress and risk.

Read
1 min
Sources
2
Published
8:30 AM
Quick view

It is a standardized test for “AI security auditors.” The model reads contract code, fixes it without breaking tests, or demonstrates an exploit on a local blockchain setup.

  • Runs three modes: detect vulnerabilities, patch code while preserving behavior, and exploit a deployed contract in a sandbox.
  • Uses automated tests and chain-state checks to grade outcomes programmatically.
  • Publishes tasks and tooling to make results reproducible across models.
Read details
AI Tools Easy

Meta says SAM 3.1 makes real-time video object tracking cheaper to run

Meta says SAM 3.1 uses object multiplexing to track up to 16 objects per pass, doubling throughput on an H100 GPU and cutting redundant compute.

Why it matters

Segmentation and tracking models power video editing, robotics, and monitoring. If the same accuracy can be delivered with fewer GPU passes, more teams can run these workflows in real time on smaller hardware.

Read
1 min
Sources
1
Published
1:00 PM
Quick view

Instead of re-watching a video separately for each object you care about, the model tracks many objects at once, which saves time and memory.

  • Object multiplexing lets the model track up to 16 objects in a single forward pass.
  • Meta reports throughput rising from 16 to 32 frames per second on a single H100 for medium-object videos.
  • Positioned as a drop-in replacement for SAM 3 to reduce redundant computation.
Read details
AI Safety Easy

Google updates its 2026 Responsible AI Progress Report

Google’s updated Responsible AI Progress Report describes how the company says it tests, governs, and monitors AI systems across the product lifecycle.

Why it matters

As AI systems become more capable and widely deployed, the most important safety work is often operational: how models are tested, how risks are reviewed, and what happens when failures are found after launch.

Read
1 min
Sources
1
Published
7:30 PM
Quick view

This is Google’s public “how we try to build AI responsibly” document: it summarizes the processes the company says it uses to review risk, test systems, and respond to issues.

  • Frames responsible AI as lifecycle work, from early research through post-launch monitoring and remediation.
  • Emphasizes governance tied to Google’s AI Principles and human expert review supported by automation.
  • Positions the report as a recurring transparency update rather than a one-time policy statement.
Read details