Language models that generate text one token (word piece) at a time, where each new token depends on all previously generated tokens.