Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

Shilin Yan, Jintao Tong, Hongwei Xue, Xiaojun Tang, Yangyang Wang et al.|April 9, 2026arXiv

Key Takeaway

Agents can learn to use tools more wisely by training them with separate optimization objectives for accuracy and efficiency, rather than combining both into a single reward signal that creates conflicting incentives.

Summary

This paper addresses a critical problem in AI agents: they overuse external tools even when they could solve problems using their own knowledge. The authors propose HDPO, a training framework that teaches agents to be smarter about when to use tools by separating the optimization into two independent channels—one for accuracy and one for efficiency.

agents reasoning multimodal

Key Terms

agentic-ai tool-use meta-cognitive conditional-advantage-estimation curriculum-learning