AI Scanner — 2026-04-10

market signal @tech__unicorn

7/10

Anthropic's Model Scores High on SWE-Bench

Anthropic's model achieves a 78% score on SWE-Bench, significantly outperforming GPT-5 and Opus. This unexpected cybersecurity capability raises concerns about the potential threats posed by such models.

Mythos is fucking scary….Anthropic built a model scoring 78% on SWE-Bench. GPT-5 gets 57%. Opus gets 53%. The cybersecurity ability wasn’t planned. It just emerged…These types of models are legitimately a threat. So they quietly patched with AWS, Google, Microsoft, and

👁 224 views ❤ 3 🔁 0 💬 0 🔖 0 1.3% eng

AIAnthropicSWE-Benchcybersecuritybenchmarking

infrastructure @0xCVYH

7/10

KV Cache Attention Rotation Enabled by Default

The latest release of llama.cpp introduces KV cache attention rotation as the default setting, significantly improving the efficiency of Q8_0 inference without quality loss. This change reduces the impact of Q4_0 on the KV cache, which could be relevant for engineers optimizing AI model performance.

llama.cpp release b8699 brought KV cache attention rotation enabled by default. Practical result: Q8_0 becomes practically lossless (inference time without compromising quality) and the impact of Q4_0 on the KV cache became much smaller than it was before. Translation for those

👁 47 views ❤ 2 🔁 0 💬 0 🔖 0 4.3% eng Actionable

llama.cppKV cacheAI infrastructuremodel optimizationperformance

model release @ModelScope2022

7/10

MinerU2.5-Pro Model Launch

MinerU2.5-Pro is a new 1.2B model that achieves state-of-the-art performance on the OmniDocBench v1.6 benchmark for PDF to Markdown parsing, outperforming several existing models. The significant improvement in performance is attributed to a substantial increase in training data, which may interest engineers focused on model training and performance optimization.

MinerU2.5-Pro is here. SOTA on OmniDocBench v1.6 (95.69), PDF to Markdown parsing. A 1.2B model that outperforms Gemini 3 Pro, Qwen3-VL-235B, GLM-OCR, and PaddleOCR-VL-1.5. The entire leap from 92.98 to 95.69 came from data: 65.5M training pages (up from <10M),

👁 2,534 views ❤ 49 🔁 5 💬 0 🔖 27 2.1% eng Actionable

AImodel releasebenchmarkPDF parsingtraining data

research @itsjasonai

7/10

Google's ConvApparel Dataset for Human-AI Conversations

Google Research has released ConvApparel, a dataset aimed at evaluating the 'realism gap' in human-AI conversations. This could be useful for engineers focused on improving conversational AI systems and understanding their limitations.

Google Research introduced ConvApparel, a new human-AI conversation dataset for measuring the "realism gap"

👁 0 views ❤ 0 🔁 0 💬 0 🔖 0 0.0% eng

AIdatasetconversational AIGoogle Researchrealism gap

open source drop @haimengzhao

7/10

Open-sourcing Quantum AI Framework in JAX

This tweet announces the open-sourcing of a core framework for Quantum AI, built in JAX with GPU/TPU support. Senior engineers may find the actual code and implementation useful for experimentation and development.

To bridge theory and practice, we are open-sourcing our core framework. Our numerical implementation is built in JAX (with native GPU/TPU acceleration). Check out the code, run the simulations, and help us shape the future of Quantum AI at

👁 329 views ❤ 7 🔁 0 💬 0 🔖 2 2.1% eng Actionable

open sourcequantum AIJAXGPUTPU

infrastructure @WESummit2026

7/10

Netflix's AI Workflow Orchestration

Pratyusha Singaraju discusses the complex orchestration of ML models and human review at Netflix, highlighting the infrastructure improvements that enable seamless integration of AI systems. Senior engineers may find insights into scalable workflow management relevant for their own projects.

Every title on @netflix passes through a complex pipeline of rules, ML models, and human review - at massive scale. Pratyusha Singaraju shares how they rebuilt workflow orchestration to make these systems work seamlessly together - & why it sets the stage for AI agents next.

👁 0 views ❤ 0 🔁 0 💬 0 🔖 0 0.0% eng

NetflixAIinfrastructureworkflowmachine learning

infrastructure @ClickHouseDB

7/10

Building an Effective AI SRE

This post discusses the importance of a solid data foundation for AI SREs, emphasizing the need for historical context and system topology in AI systems. Senior engineers may find the architectural insights valuable for improving their own AI infrastructure.

What does it actually take to build an AI SRE that works? Not a bigger model - a better data foundation. clickhou.se/4ca2N3M Human SREs reason from historical context and system topology. AI needs the same thing. This post breaks down the architecture.

👁 0 views ❤ 0 🔁 0 💬 0 🔖 0 0.0% eng

AISREinfrastructuredata foundationarchitecture

infrastructure @elvissun

7/10

Optimizing Vercel Build Minutes

The tweet discusses a practical solution to reduce build minutes on Vercel by building locally and using turbo cache, resulting in significant cost savings. Senior engineers would find this relevant for optimizing CI/CD workflows.

if you have multiple agents opening PRs, each one triggers a full build. that's why I've been paying @vercel $150/mo in build minutes the past 2 months lol. the fix: build locally before push → turbo cache → vercel skips the build entirely. 78% fewer build minutes. 5x

👁 638 views ❤ 7 🔁 0 💬 3 🔖 4 1.6% eng Actionable

VercelCI/CDbuild optimizationturbo cacheinfrastructure

market signal @botnewsnetwork

7/10

Flowise Agent Framework Vulnerability Alert

Flowise has been identified as the fourth agent framework with a critical CVSS 10.0 vulnerability, already being exploited in the wild. This highlights ongoing security issues in AI tools that builders need to be aware of.

Flowise just became the fourth agent framework caught shipping unsandboxed code execution into production. This time it's CVSS 10.0 — maximum severity — and VulnCheck confirms attackers are already exploiting it from the wild. The vulnerability is almost insultingly simple.

👁 0 views ❤ 0 🔁 0 💬 0 🔖 0 0.0% eng

securityvulnerabilityAI toolsFlowiseproduction

platform shift @Noobwork

7/10

Google Gemini API Pricing Changes

Google has introduced Flex and Priority tiers to the Gemini API, offering a 50% reduction in cost for latency-tolerant workloads and improved reliability. This reflects a maturation in AI infrastructure, which may impact how engineers approach API usage and cost management.

Are tokens the currency of the future? Google just added Flex and Priority tiers to the Gemini API. 50% cheaper for latency-tolerant workloads. Higher reliability with automatic downgrade instead of failure. The real story: AI infrastructure is maturing into explicit

👁 0 views ❤ 0 🔁 0 💬 0 🔖 0 0.0% eng

GoogleGemini APIpricingAI infrastructurecloud services

model release @off_thetarget

7/10

Gemma 4 Stabilized on llama.cpp

Gemma 4 has been stabilized on llama.cpp after initial bugs, featuring various model configurations. Senior engineers may find the performance benchmarks noteworthy, especially the ranking of the 31B model on Arena AI.

Gemma 4 is finally stable on llama.cpp On April 2nd, Google released Gemma 4, and it had llama.cpp support on day one but with lots of bugs. Now all issues have been fixed E2B, E4B, 26B MoE, 31B Dense 31B ranks #3 on Arena AI, 26B ranks #6 The strongest tier of open-source

👁 824 views ❤ 5 🔁 0 💬 3 🔖 4 1.0% eng Actionable

Gemma 4llama.cppAI modelsopen sourceperformance benchmarks

market signal @shawnchauhan1

7/10

Meta's Muse Spark Efficiency Benchmark

Meta claims Muse Spark achieves top-five global benchmarks using significantly less compute than Llama 4 Maverick, challenging the notion that advanced AI requires extensive infrastructure investment. This could indicate a shift in how AI systems are built and deployed.

Meta built Muse Spark using over 10x less compute than Llama 4 Maverick. Top-five globally on benchmarks. Fraction of the training cost. Efficiency curves compressing this fast changes the underlying assumption that frontier AI requires frontier infrastructure spend. The labs

👁 0 views ❤ 0 🔁 0 💬 0 🔖 0 0.0% eng

MetaAIefficiencybenchmarkMuse Spark

market signal @vedangvatsa

7/10

Llama 3 and Phi-4 Benchmark Insights

The tweet discusses the performance of Llama 3 and Phi-4 compared to GPT-3.5 and GPT-4o, highlighting significant efficiency and capability improvements. Senior engineers may find the benchmarks relevant for evaluating model performance and infrastructure requirements.

GPT-3.5 had 175 billion parameters. Llama 3 matched it with 8 billion. That is 20x fewer. Phi-4 has 14 billion parameters. It outperforms GPT-4o on math and graduate-level science benchmarks. A model that runs on a laptop beating one that needs a datacenter. The pattern is

👁 57 views ❤ 3 🔁 2 💬 0 🔖 0 8.8% eng

AIbenchmarkingLlama 3Phi-4GPT-4o

infrastructure @OSSInsight

7/10

Rust's Role in AI Infrastructure Growth

The tweet discusses the rapid development of Rust-based AI infrastructure repositories, highlighting a shift in the AI stack towards Rust for runtimes while using Python for models. This trend may indicate a significant evolution in how AI systems are built and deployed, which could be relevant for engineers focused on performance and efficiency.

The Rust Shift in AI 7 Rust agent infra repos in 60 days. zeroclaw 30K . agent-browser 28K . Python for models. Rust for runtimes. The AI stack is splitting — just like web infra did a decade ago. ossinsight.io/blog/rust-ai-a … #Rust #AI #GitHub #OpenSource @zeroclawlabs

👁 0 views ❤ 0 🔁 0 💬 0 🔖 0 0.0% eng

RustAIinfrastructureopen sourceGitHub

market signal @SortaKinda_Cool

7/10

Gemini 3.1 Pro Dominates Benchmarks

Gemini 3.1 Pro outperforms most competitors in benchmarks and ties with GPT-5.4 Pro on a key index, all at a significantly lower cost. This indicates a strong competitive position for Google in the AI landscape, which may influence future development strategies.

Gemini 3.1 Pro leads 13 of 16 major benchmarks right now. it ties GPT-5.4 Pro on the Artificial Analysis Intelligence Index. it costs roughly a third of the price. Google is winning the benchmark race and the cost race simultaneously. the discourse is still OpenAI vs Anthropic.

👁 0 views ❤ 0 🔁 0 💬 0 🔖 0 0.0% eng

AI benchmarksGemini 3.1 ProGoogleGPT-5.4 Promarket trends

market signal @ai_for_success

7/10

Benchmark for AI Agents in Tax Workflows

A new benchmark reveals that GPT-5.4 leads at 28% in testing AI agents on real tax workflows, highlighting the challenges all models face in high-stakes, multi-step tasks. This insight could inform future model development and evaluation criteria.

We finally have a benchmark that tests AI agents on real tax workflows. GPT-5.4 is leading at 28% but all models still su**xs on high-stakes, multi-step tasks. New model cards should have benchmarks like this in future.

👁 1,513 views ❤ 12 🔁 0 💬 2 🔖 2 0.9% eng

AIbenchmarktax workflowsGPT-5.4model evaluation

AI Twitter Scanner