Anthropic's Claude Opus 4.6, in collaboration with Mozilla, identified 22 significant vulnerabilities in Firefox within a two-week security audit. This highlights the potential of AI in enhancing software security, which is relevant for engineers focused on building robust systems.
AIใFirefoxใฎ้ๅคงใช่ๅผฑๆงใ2้ฑ้ใง22ไปถ็บ่ฆใใใฃใฆ่ฉฑใใใชใ่กๆ็ใ ใฃใใฎใงๅ ฑๆใใใฆใใ ใใ
AnthropicใฎClaude Opus 4.6ใใMozillaใจๅๅใใฆFirefoxใฎใปใญใฅใชใใฃ็ฃๆปใๅฎๆฝใใ็ตๆใงใใ
ใฉใใชๆๆใ ใฃใใใจใใใจโฆ
ใป2้ฑ้ใง22ไปถใฎ่ๅผฑๆงใ็บ่ฆ
Claude Sonnet 4.6 has achieved the highest score in the GDPval-AA Elo benchmark, surpassing competitors Opus 4.6 and Gemini 3.1 Pro. This indicates a significant shift in the competitive landscape of AI coding tools, which may influence future development choices.
Claude Sonnet 4.6 leads the GDPval-AA Elo benchmark with 1,633 points , ahead of Opus 4.6 AND Gemini 3.1 Pro.
The coding wars have a new king.
This tweet presents a cost comparison of various AI coding models, highlighting the performance and pricing of open-source versus proprietary options. Senior engineers should care about these metrics as they reflect the competitive landscape and cost-effectiveness of AI solutions for coding tasks.
This chart should scare every AI company charging premium prices for coding models.
SWE-rebench, resolved vs average cost per instance:
โ MiniMax M2.5 (open source): 75.8% resolved at ~$0.05 per task
โ Claude Opus 4.6: 75.6% at ~$0.35 per task
โ Claude 4.5 Opus: 76.8% at