A benchmark dataset where models learn to determine whether a given sentence answers a given question, used to train models for question-answer relevance scoring.
Adhering to complex, structured, or constrained instructions