A continuously updated evaluation system that scores models on new data as it arrives, rather than a fixed test set.