Florence-2 can now be easily integrated into robot software stacks through a standardized ROS 2 wrapper, enabling local vision-language inference on consumer GPUs without cloud dependencies.
This paper presents a ROS 2 software wrapper that integrates Florence-2, a vision-language model, into robotic systems for local inference.