AI Scanner — All Dates

infrastructure @_philschmid

7/10

The Gemini API introduces Flex and Priority service tiers, allowing for cost and latency optimizations for production workloads with minimal changes. This is relevant for engineers looking to enhance their infrastructure efficiency without extensive modifications.

Optimizing continues, today Flex and Priority `service_tiers` for the Gemini API. Optimize costs, reliability and latency for production workloads with a single line change. **Flex Inference:** Pay 50% less for latency-tolerant workloads (no batch file management) =

👁 2,628 views ❤ 63 🔁 2 💬 7 🔖 16 2.7% eng Actionable

Gemini APIinfrastructureservice tierscost optimizationlatency

infrastructure @PawelHuryn

7/10

Gemma 4's KV Cache Architecture Explained

The tweet discusses Gemma 4's use of shared KV cache layers, which allows it to run on a laptop but also highlights a limitation in cache reuse for llama.cpp. This insight into architecture could be relevant for engineers working on efficient AI system designs.

There is a catch nobody is talking about. Gemma 4 uses shared KV cache layers - the last layers reuse K/V tensors from earlier layers instead of computing their own. That is why it fits on a laptop. But that same architecture breaks cache reuse in llama.cpp. Every request

👁 5,927 views ❤ 33 🔁 9 💬 10 🔖 39 0.9% eng

AIinfrastructurecacheGemma 4llama.cpp

infrastructure @elvissun

7/10

Optimizing Vercel Build Minutes

The tweet discusses a practical solution to reduce build minutes on Vercel by building locally and using turbo cache, resulting in significant cost savings. Senior engineers would find this relevant for optimizing CI/CD workflows.

if you have multiple agents opening PRs, each one triggers a full build. that's why I've been paying @vercel $150/mo in build minutes the past 2 months lol. the fix: build locally before push → turbo cache → vercel skips the build entirely. 78% fewer build minutes. 5x

👁 638 views ❤ 7 🔁 0 💬 3 🔖 4 1.6% eng Actionable

VercelCI/CDbuild optimizationturbo cacheinfrastructure

infrastructure @googledevs

7/10

Five Patterns for Building AI Agents

This tweet discusses architectural patterns for building production-grade AI agents, emphasizing the importance of architecture over prompts. Senior engineers may find value in the insights derived from the Google AI Bake-Off, particularly regarding multi-agent systems and deterministic execution.

Building production-grade AI agents? It's not about better prompts, it's about better architecture. Learn five patterns from the Google AI Bake-Off, from multi-agent systems to deterministic execution. Read the blog:

👁 2,054 views ❤ 7 🔁 3 💬 0 🔖 5 0.5% eng

AI agentsarchitectureGoogle AI Bake-Offmulti-agent systemsdeterministic execution

infrastructure @HuggingPapers

7/10

Tencent's DisCa for Video Diffusion Transformers

Tencent has introduced DisCa, a method that enhances video diffusion transformers' performance by 11.8× while maintaining quality. This could be relevant for engineers looking to optimize their AI video processing workflows.

Tencent just released DisCa on Hugging Face A distillation-compatible learnable feature caching method that accelerates video diffusion transformers by 11.8× while preserving generation quality.

👁 999 views ❤ 16 🔁 6 💬 0 🔖 6 2.2% eng

Tencentvideo diffusionAI infrastructureperformance optimizationHugging Face

infrastructure @rohanpaul_ai

7/10

OpenAI Launches Long-Running Agent Runtime

OpenAI's new Agents SDK allows developers to manage long-running agents with sandbox execution and direct control over memory and state, streamlining what previously required multiple components. This could simplify infrastructure for AI systems, making it relevant for engineers building complex applications.

OpenAI just turned the Agents SDK into a long-running agent runtime with sandbox execution and direct control over memory and state. Before this, developers often had to stitch together 3 separate pieces themselves: the model loop, the machine where code runs, and the memory or

👁 838 views ❤ 5 🔁 3 💬 3 🔖 7 1.3% eng Actionable

OpenAIAgents SDKinfrastructureAI developmentruntime

AI Twitter Scanner