Anthropic, MATS, and Anthropic Fellows program scholars have released their latest research evaluating the attack capabilities of cutting-edge AI models on blockchain smart contracts. The team built a new benchmark called SCONE-bench, which includes 405 contracts that were actually attacked between 2020 and 2025, and quantifies risk in terms of "total amount of money that can be stolen" rather than a simple success rate. The results show that among the 34 contracts deployed after the knowledge cut-off time and subsequently attacked by real people, Claude Opus 4.5, Claude Sonnet 4.5, and GPT-5 found a total of 19 exploitable points in the simulated environment, corresponding to a potential profit of approximately $4.6 million.
Onall 405 benchmark questions, 10 models combined to generate direct-to-action attack scripts for 207 cases, simulating the "theft" of approximately $550.1 million. The study also screened out 2,849 recently deployed ERC-20 contracts with no known vulnerabilities on the Binance Smart Chain, automated testing of two of them, and found two previously undisclosed zero-day vulnerabilities, with a maximum profit of about $3,694 based on historical liquidity estimates, of which an experiment with GPT-5 still has room for profit after deducting about $3,476 in API costs.
The research team emphasized that all attacks were only executed in local forked chains and container sandboxes, without using funds on real public chains. For high-risk contracts discovered, fund rescue or risk warning is completed through cooperation with security organizations and white hats. The author pointed out that the model's "stolen amount" on 2025 contracts has roughly doubled every 1.3 months over the past year, indicating that AI network offensive and defensive capabilities are rapidly improving, and called for the systematic adoption of AI tools in smart contract auditing and defense as soon as possible.
FAQs
Q: What did the study do?
A: Build a SCONE-bench benchmark that allows multiple AI models to automatically find and exploit smart contract vulnerabilities on simulated chains, and measure attack capabilities based on the amount that can be stolen.
Q: What do the $4.6 million and $550 million mentioned in the text represent?
A: $4.6 million is the minimum potential profit limit for the model on contracts that are actually compromised after the knowledge cut-off, and $550.1 million is the total amount of "stolen funds" simulated on 405 historical attack cases.
Q: Did you really steal real money on the public chain?
A: The researcher explained that all tests were completed in the local forked chain and sandbox environment, and no attacks were carried out on real blockchain assets.
Q: How does the so-called "zero-day vulnerability" manifest in this study?
A: In the simulation test of 2849 recent BSC contracts, both models each discovered previously unknown vulnerabilities and gave a complete attack path, which can make thousands of dollars in profits based on historical liquidity.
Q: What is the practical value of this work for smart contract developers and defenders?
A: The team plans to open up benchmarks and evaluation frameworks to help developers conduct automated "red teaming" of contracts before going live, and identify and patch flaws that may be exploited by AI attackers in advance.