Spoken dialogue models can now follow duration constraints (e.g., 'respond in 15 seconds') by inserting time markers during generation, making them more practical for real-world voice applications.
TiCo is a post-training method that teaches spoken dialogue models to generate responses with specific durations. It uses time markers during generation to help models track elapsed speaking time and adjust content to meet target lengths, improving real-world voice assistant interactions without requiring new training data.