An older transformer-based design for language models that generates text by predicting one word at a time, simpler and smaller than modern alternatives.