logo

Anthropic's Claude 3.7 Sonnet is here and results are insane

Anthropic

Anthropic has started rolling out Claude 3.7 Sonnet, the company's most advanced model and the first hybrid reasoning model it has shipped.

Early tests show that Claude 3.7 Sonnet is outperforming rivals, including OpenAI's ChatGPT models and China's DeepSeek.

In a blog post, Anthropic noted that its newest model combines fast, straightforward answers with the ability to “think” step-by-step for complex tasks. This makes the Claude 3.7 model the best for programming, and these claims are backed by benchmarks.

Claude 3.7
SWE-bench Verified shows Claude 3.7 Sonnet is the best model for coding

According to a benchmark test called “Software engineering (SWE-bench verified),” Claude 3.7 Sonnet is at the top with roughly 62% accuracy, which goes up to 70% when using extra test-time “scaffolding.”

Competing models, including Claude 3.5 Sonnet and OpenAI’s variants, sit closer to the 50% range. 

"Software engineering (SWE-bench verified)" is a benchmarking standard to see how well an AI model does when asked to code a program.

These results show that Claude 3.7 Sonnet is significantly ahead of its competitors in terms of coding.

AGI moment for some users

Users are also claiming that the results are insane.

For example, in a thread, Reddit users noted that the model delivered outstanding results when they used it to create apps or even games.

“Claude Code was my ‘Feel the AGI moment.’ I’ve thrown bugs at this thing that no other models could fix, but Claude Code blasted through them," one user wrote in a Reddit thread.

Another user added: “3.7 just slapped out an entire project I had been working on for months—5000 lines of code, front-end, debugging example, all from scratch. It didn’t stop until the job was done.”

Claude 3.7 Sonnet
Claude 3.7 Sonnet benchmarks

Additionally, Claude 3.7 Sonnet appears to excel in most categories, with its “extended thinking” mode boosting accuracy on tasks like math and science.

Other models, such as OpenAI’s 0.1 and DeepSeek R1, trail behind on many of these tests.


Free online web security scanner