Gaussian GRPO normalizes reward distributions across diverse visual tasks to improve training stability, enabling open-source multimodal models to match proprietary systems on reasoning and perception tasks.
OpenVLThinkerV2 is a multimodal AI model that combines vision and language understanding for complex visual reasoning tasks. The key innovation is Gaussian GRPO, a new training method that stabilizes learning across different types of visual tasks by normalizing reward signals, while task-specific techniques help the model balance detailed visual perception with multi-step reasoning.