The task of automatically detecting and categorizing text that violates policies or could cause harm, such as hate speech, violence, or misinformation.