You can train smaller language models to perform complex agentic tasks like presentation generation by using creative reward signals (like inverse task verification) and parameter-efficient fine-tuning, achieving 91% of large model quality with only 7B parameters.
This paper presents a reinforcement learning system that trains AI agents to automatically generate professional slide presentations. The key innovation is an "inverse specification reward" that checks if slides accurately convey their intended message by having an LLM try to recover the original brief from the generated slides.