Learning to Present: Inverse Specification Rewards for Agentic Slide Generation

Karthik Ragunath Ananda Kumar, Subrahmanyam Arunachalam|March 17, 2026arXiv

Key Takeaway

You can train smaller language models to perform complex agentic tasks like presentation generation by using creative reward signals (like inverse task verification) and parameter-efficient fine-tuning, achieving 91% of large model quality with only 7B parameters.

Summary

This paper presents a reinforcement learning system that trains AI agents to automatically generate professional slide presentations. The key innovation is an "inverse specification reward" that checks if slides accurately convey their intended message by having an LLM try to recover the original brief from the generated slides.

agents training

Key Terms

inverse-specification-reward agentic-tasks grpo parameter-efficient-fine-tuning tool-use