This tweet discusses a research paper exploring how effectively AI agents can find and utilize their skills independently. Senior engineers may find the insights valuable for understanding agent behavior and improving AI system design.
How well do agent skills actually work when agents must find and use them on their own?
Check out the lates work from our lab!
arxiv.org/abs/2604.04323
The tweet highlights the growth in downloads of six major AI agent frameworks, indicating a strong market trend towards AI agents. Senior engineers should note the increasing traction and potential for these frameworks in production systems.
developers already decided AI agents work. the download data is unanimous.
six major agent frameworks. all accelerating, zero declining.
-
@LangChain
at 8.2M weekly downloads, +3.5%.
-
@OpenAI
Agents at 965K, +11.8%.
the last time every framework in a category grew
👁 382 views❤ 7🔁 3💬 3🔖 03.4% eng
AI agentsframeworksdownloadsmarket trendsinfrastructure
Z.ai's GLM-5.1 is currently the top open-source model in Code Arena, outperforming several notable competitors. This ranking indicates the competitive landscape of AI models and may influence future development and adoption decisions.
With GLM-5.1,
Z.ai maintains the top spot in the rankings for open-source models in Code Arena, currently trailing the overall leader by just about 20 points, while outperforming Claude Sonnet 4.6, Opus 4.5, GPT-5.4 High, and Gemini-3.1 Pro. Open-source models
Alibaba has released its Qwen 3.6+ model, achieving top scores on multiple benchmarks, including 61.6 on terminal-bench and 80.9 on multilingual agentic coding. This performance indicates a significant advancement in AI model capabilities that builders should monitor.
breaking.. alibaba mass dropped qwen 3.6-plus and it's embarrassing every frontier model right now
61.6 on terminal-bench (beats claude 4.5 opus)
56.6 on swe-bench pro (1st place)
80.9 on multilingual agentic coding (1st place)
58.7 on claw-eval real world agent (1st place)
A new version of the Huihui-gemma model shows improved perplexity metrics compared to its original, indicating potential quality enhancements. This release may interest engineers looking for better-performing models in their AI systems.
An absolutely unexpected result: tested with llama-perplexity, the ablated version actually has a lower PPL than the original model.
The smaller the PPL value, the higher the model quality.
We will upload the Huihui-gemma-4-31B-it-abliteratedv2 version, with fewer warnings and
GPT-5.4 has set a new top-1 entry on PostTrainBench, improving performance from 20.2% to 28.2% using a simple reprompting technique. This indicates a significant advancement in model performance that could influence future AI development strategies.
New top-1 entry on PostTrainBench: GPT-5.4 with a simple reprompting loop ("You still have
The Memory Intelligence Agent (MIA) proposes a new architecture that enhances 7B models to outperform GPT-5.4 through a Manager-Planner-Executor framework with continual learning. This could be of interest to engineers looking for novel strategies in AI model development.
MIA: Memory Intelligence Agent
Evolves deep research agents from passive record-keepers into active strategists, enabling 7B models to outperform GPT-5.4 via a Manager-Planner-Executor architecture with continual test-time learning.