Stanford's research reveals that leading AI models like GPT-5 and Google Gemini maintain high accuracy without images, highlighting a significant flaw in AI vision systems. This finding could prompt engineers to reassess model reliability in real-world applications.
Holy shit⦠Stanford University just exposed a massive flaw in AI vision.
GPT-5, Google Gemini, and Claude scored 70β80% accuracyβ¦ with no images at all.
They call it the βmirage effectβ β
β Researchers removed images from 6 major benchmarks
β Models kept answering like
π 932 viewsβ€ 10π 6π¬ 3π 22.0% eng
AI researchvision systemsStanfordGPT-5Google Gemini
This tweet discusses a novel soft robot design that utilizes heat-responsive materials and embedded electronics for movement without traditional mechanical systems. Senior engineers may find the innovative approach to robotics and materials science relevant for future applications in AI and automation.
A new origami-inspired soft robot uses heat-responsive materials and embedded electronics to move, fold, and reshape itself, without motors, pumps, or bulky mechanical systems.
@Princeton
ConvApparel is a new dataset aimed at improving LLM-based user simulators by quantifying the 'realism gap.' This could be relevant for engineers focused on enhancing conversational agent training methodologies.
Introducing ConvApparel, a new human-AI conversation dataset, as well as a comprehensive evaluation framework designed to quantify the "realism gap" in LLM-based user simulators and improve the training of robust conversational agents.
Read all about it β
goo.gle/41k5eff
This tweet discusses a research paper exploring how effectively AI agents can find and utilize their skills independently. Senior engineers may find the insights valuable for understanding agent behavior and improving AI system design.
How well do agent skills actually work when agents must find and use them on their own?
Check out the lates work from our lab!
arxiv.org/abs/2604.04323
The Memory Intelligence Agent (MIA) proposes a new architecture that enhances 7B models to outperform GPT-5.4 through a Manager-Planner-Executor framework with continual learning. This could be of interest to engineers looking for novel strategies in AI model development.
MIA: Memory Intelligence Agent
Evolves deep research agents from passive record-keepers into active strategists, enabling 7B models to outperform GPT-5.4 via a Manager-Planner-Executor architecture with continual test-time learning.
π 1,897 viewsβ€ 43π 15π¬ 2π 193.2% eng
A researcher has developed a tool that effectively removes Google's SynthID watermark from images generated by Gemini, achieving 90% detection accuracy. This finding could have implications for watermarking techniques in AI-generated content.
One researcher beat Google's watermark with a math trick.
So Google puts an invisible watermark in every image Gemini generates.
They call it SynthID.
And this researcher figured out exactly how it works and built a tool to remove it.
90% detection accuracy. 43+ dB image
Announcement of a research presentation on AI's role in security, specifically focusing on a project called 'HTTP Terminator.' Senior engineers may find the insights relevant for understanding AI's application in security contexts.
I'm thrilled to announce "Can AI Do Novel Security Research? Meet the HTTP Terminator" will premiere at
@BlackHatEvents
#BHUSA! Check out the abstract:
π 8,260 viewsβ€ 181π 32π¬ 8π 552.7% eng
This tweet discusses a new method presented at NLP2026 for resolving notation variations in medical department names using an LLM, achieving a high accuracy rate. Senior engineers may find the approach and results relevant for improving NLP applications in healthcare.
Published a new article on the KAKEHASHI Tech Blog.
We presented at NLP2026 a method that resolves "notation variations" in medical department names using an LLM, achieving a 97.5% accuracy rate with GPT-5. Please take a look.
Anthropic's new research explores using a weak AI model to supervise the training of a stronger one, potentially accelerating alignment research. This could have implications for how AI systems are developed and aligned in the future.
New Anthropic Fellows research: developing an Automated Alignment Researcher.
We ran an experiment to learn whether Claude Opus 4.6 could accelerate research on a key alignment problem: using a weak AI model to supervise the training of a stronger one.
π 11,980 viewsβ€ 252π 47π¬ 21π 882.7% eng
AI alignmentresearchAnthropicClaude Opusmachine learning