A smaller, faster model used in speculative decoding to quickly propose token sequences before a larger model verifies them.