Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Zhuolin Yang, Zihan Liu, Yang Chen, Wenliang Dai, Boxin Wang et al.|March 19, 2026arXiv

Key Takeaway

You can build highly capable reasoning models with far fewer active parameters by combining domain-specific reinforcement learning with multi-domain distillation—this model matches frontier performance with 20x fewer parameters.

Summary

Nemotron-Cascade 2 is a 30B parameter model with only 3B active parameters that achieves top-tier reasoning and coding performance comparable to much larger models.

training reasoning efficiency

Key Terms

mixture-of-experts reinforcement-learning knowledge-distillation on-policy-rl supervised-fine-tuning