A standardized set of test problems used to measure and compare the performance of different algorithms or models.