MinerU2.5-Pro is a new 1.2B model that achieves state-of-the-art performance on the OmniDocBench v1.6 benchmark for PDF to Markdown parsing, outperforming several existing models. The significant improvement in performance is attributed to a substantial increase in training data, which may interest engineers focused on model training and performance optimization.
MinerU2.5-Pro is here. SOTA on OmniDocBench v1.6 (95.69), PDF to Markdown parsing.
A 1.2B model that outperforms Gemini 3 Pro, Qwen3-VL-235B, GLM-OCR, and PaddleOCR-VL-1.5. The entire leap from 92.98 to 95.69 came from data: 65.5M training pages (up from <10M),
The tweet discusses a practical solution to reduce build minutes on Vercel by building locally and using turbo cache, resulting in significant cost savings. Senior engineers would find this relevant for optimizing CI/CD workflows.
if you have multiple agents opening PRs, each one triggers a full build.
that's why I've been paying
@vercel
$150/mo in build minutes the past 2 months lol.
the fix: build locally before push → turbo cache → vercel skips the build entirely.
78% fewer build minutes. 5x
Gemma 4 has been stabilized on llama.cpp after initial bugs, featuring various model configurations. Senior engineers may find the performance benchmarks noteworthy, especially the ranking of the 31B model on Arena AI.
Gemma 4 is finally stable on llama.cpp
On April 2nd, Google released Gemma 4, and it had llama.cpp support on day one but with lots of bugs. Now all issues have been fixed
E2B, E4B, 26B MoE, 31B Dense
31B ranks #3 on Arena AI, 26B ranks #6
The strongest tier of open-source
A new benchmark reveals that GPT-5.4 leads at 28% in testing AI agents on real tax workflows, highlighting the challenges all models face in high-stakes, multi-step tasks. This insight could inform future model development and evaluation criteria.
We finally have a benchmark that tests AI agents on real tax workflows.
GPT-5.4 is leading at 28% but all models still su**xs on high-stakes, multi-step tasks.
New model cards should have benchmarks like this in future.