Tencent has released the Hunyuan Embodied AI model on Hugging Face, featuring a 2B parameter vision-language architecture that achieves state-of-the-art results on multiple benchmarks. While the model's performance is noteworthy, its practical application and integration into existing systems remain to be seen.
Tencent just released the Hunyuan Embodied AI model on Hugging Face
A 2B parameter vision-language model with Mixture-of-Transformers architecture.
It achieves SOTA results on CV-Bench, DA-2K and 10+ embodied understanding benchmarks.