Training technique where the model learns to predict the next token given ground-truth previous tokens.