Z.ai's GLM-5.1 is currently the top open-source model in Code Arena, outperforming several notable competitors. This ranking indicates the competitive landscape of AI models and may influence future development and adoption decisions.
With GLM-5.1,
Z.ai maintains the top spot in the rankings for open-source models in Code Arena, currently trailing the overall leader by just about 20 points, while outperforming Claude Sonnet 4.6, Opus 4.5, GPT-5.4 High, and Gemini-3.1 Pro. Open-source models
A new version of the Huihui-gemma model shows improved perplexity metrics compared to its original, indicating potential quality enhancements. This release may interest engineers looking for better-performing models in their AI systems.
An absolutely unexpected result: tested with llama-perplexity, the ablated version actually has a lower PPL than the original model.
The smaller the PPL value, the higher the model quality.
We will upload the Huihui-gemma-4-31B-it-abliteratedv2 version, with fewer warnings and
GPT-5.4 has set a new top-1 entry on PostTrainBench, improving performance from 20.2% to 28.2% using a simple reprompting technique. This indicates a significant advancement in model performance that could influence future AI development strategies.
New top-1 entry on PostTrainBench: GPT-5.4 with a simple reprompting loop ("You still have
The Memory Intelligence Agent (MIA) proposes a new architecture that enhances 7B models to outperform GPT-5.4 through a Manager-Planner-Executor framework with continual learning. This could be of interest to engineers looking for novel strategies in AI model development.
MIA: Memory Intelligence Agent
Evolves deep research agents from passive record-keepers into active strategists, enabling 7B models to outperform GPT-5.4 via a Manager-Planner-Executor architecture with continual test-time learning.