LLM evaluation can be more rigorous by borrowing established methods from psychology and cognitive science—this platform shows how to systematically apply those methods at scale.
Researchers built PsyCogMetrics AI Lab, a cloud platform that applies psychology and cognitive science methods to evaluate large language models. The study uses a rigorous three-phase design process to identify evaluation gaps, develop theory-based assessment methods, and test them in practice.