The fundamental task where a language model learns to guess the most likely next word (or token) based on all the words that came before it.