AI Scanner — 2026-04-13

research @shedntcare_

8/10

Stanford Exposes AI Vision Flaw: Mirage Effect

Stanford's research reveals that leading AI models like GPT-5 and Google Gemini maintain high accuracy without images, highlighting a significant flaw in AI vision systems. This finding could prompt engineers to reassess model reliability in real-world applications.

Holy shit… Stanford University just exposed a massive flaw in AI vision. GPT-5, Google Gemini, and Claude scored 70–80% accuracy… with no images at all. They call it the “mirage effect” ↓ → Researchers removed images from 6 major benchmarks → Models kept answering like

👁 932 views ❤ 10 🔁 6 💬 3 🔖 2 2.0% eng

AI researchvision systemsStanfordGPT-5Google Gemini

platform shift @daniel_nguyenx

7/10

Anthropic Adjusts Claude Code Cache TTL

Anthropic's change to Claude code's cache TTL from 1 hour to 5 minutes has led to increased quota usage and costs. This adjustment could impact developers relying on their API for cost management and performance optimization.

It looks like Anthropic changed claude code’s cache TTL from 1h to 5m in March, causing significant quota and cost inflation.

👁 8,766 views ❤ 84 🔁 11 💬 8 🔖 39 1.2% eng

AnthropicClaudeAPIcost managementcache

market signal @ai_rohitt

7/10

Claude Opus 4.6 Benchmark Drop

Claude Opus 4.6 has significantly dropped in the Hallucination benchmark, falling from #2 to #10 with a 15% decrease in accuracy. This decline raises questions about the model's reliability and performance consistency, which is critical for engineers evaluating AI tools.

CLAUDE OPUS 4.6 IS NERFED. BridgeBench just proved it. Last week Claude Opus 4.6 ranked #2 on the Hallucination benchmark with an accuracy of 83.3%. Today Claude Opus 4.6 was retested and it fell to #10 on the leaderboard with an accuracy of only 68.3%. A 98% increase in

👁 106 views ❤ 14 🔁 4 💬 2 🔖 2 18.9% eng

AIbenchmarkClaude Opusperformancehallucination

research @hasantoxr

7/10

Researcher Removes Google's SynthID Watermark

A researcher has developed a tool that effectively removes Google's SynthID watermark from images generated by Gemini, achieving 90% detection accuracy. This finding could have implications for watermarking techniques in AI-generated content.

One researcher beat Google's watermark with a math trick. So Google puts an invisible watermark in every image Gemini generates. They call it SynthID. And this researcher figured out exactly how it works and built a tool to remove it. 90% detection accuracy. 43+ dB image

👁 423 views ❤ 5 🔁 0 💬 0 🔖 0 1.2% eng

watermarkingAI researchimage processingSynthIDGemini

model release @WesRoth

7/10

MiniMax AI Releases Open-Source MiniMax M2.7 Model

MiniMax AI has open-sourced its foundation model MiniMax M2.7, providing weights for autonomous coding tasks. Senior engineers may find the state-of-the-art performance claims relevant for evaluating new tools in software engineering.

MiniMax AI open-sourced its latest foundation model, MiniMax M2.7, making the weights immediately available to the global developer community via Hugging Face. The release claims state-of-the-art (SOTA) performance in highly rigorous, autonomous coding and software engineering

👁 1,424 views ❤ 14 🔁 3 💬 3 🔖 0 1.4% eng Actionable

open sourceAI modelsoftware engineeringMiniMaxHugging Face

market signal @bridgemindai

7/10

Claude Opus 4.5 Outperforms 4.6 on Hallucination Benchmark

Benchmark results indicate that Claude Opus 4.5 is outperforming its successor, 4.6, in terms of hallucination rates. This raises questions about the effectiveness of the latest model and could influence future development decisions.

Claude Opus 4.5 is now OUTPERFORMING Claude Opus 4.6 on BridgeBench Hallucination. Read that again. The legacy model is beating the current flagship. We benchmarked Opus 4.5 this morning to confirm what we saw yesterday. Claude Opus 4.6 fell from #2 to #10 with a 98%

👁 36,211 views ❤ 599 🔁 69 💬 58 🔖 84 2.0% eng

AIbenchmarkingClaude Opusmodel performancehallucination

market signal @TeslaZenX

7/10

Grok 4.20 Tops BridgeBench Inference Rankings

Grok 4.20 has achieved the highest score in the inference category of BridgeBench, outperforming GPT-5.4 and Claude Opus 4.6. This benchmark result may indicate a shift in competitive dynamics among leading AI models, which could be relevant for infrastructure decisions.

Grok 4.20 inference model has taken 1st place in the inference category of BridgeBench. With this result, Grok 4.20 has surpassed both GPT-5.4 and Claude Opus 4.6 to claim the top spot. Following its already top-tier performance in hallucination rate and instruction-following

👁 207 views ❤ 3 🔁 0 💬 0 🔖 0 1.4% eng

GrokBridgeBenchAI modelsinferencebenchmarking

market signal @bridgebench

7/10

Grok 4.20 Tops BridgeBench Reasoning Benchmark

Grok 4.20 has achieved the top position on the BridgeBench Reasoning benchmark, outperforming GPT 5.4 and Claude Opus 4.6. This indicates a significant advancement in reasoning capabilities, which may influence future AI model development.

Grok 4.20 Reasoning just took #1 on the new BridgeBench Reasoning benchmark. Beating GPT 5.4 and Claude Opus 4.6. This model keeps climbing every single week. Hallucination #1. Now Reasoning #1. While Anthropic is throwing 500 errors, xAI is quietly building the most

👁 7,231 views ❤ 79 🔁 3 💬 21 🔖 8 1.4% eng

GrokbenchmarkAI reasoningxAImodel performance

market signal @teslaownersSV

7/10

Grok 4.20 Tops BridgeBench Rankings

Grok 4.20 has achieved the top ranking on BridgeBench, surpassing other models like GPT-5.4 and Claude Opus 4.6. This benchmark may indicate a shift in competitive performance among AI models, which could influence future development decisions.

Grok 4.20 takes the #1 spot on BridgeBench Outperforming GPT-5.4, Claude Opus 4.6, and Gemini. It just keeps climbing

👁 3,158 views ❤ 45 🔁 9 💬 6 🔖 0 1.9% eng

GrokBridgeBenchAI modelsbenchmarkingperformance

research @albinowax

7/10

AI Security Research at Black Hat

Announcement of a research presentation on AI's role in security, specifically focusing on a project called 'HTTP Terminator.' Senior engineers may find the insights relevant for understanding AI's application in security contexts.

I'm thrilled to announce "Can AI Do Novel Security Research? Meet the HTTP Terminator" will premiere at @BlackHatEvents #BHUSA! Check out the abstract:

👁 8,260 views ❤ 181 🔁 32 💬 8 🔖 55 2.7% eng

AIsecurityBlack HatresearchHTTP Terminator

AI Twitter Scanner