Developing the PsyCogMetrics AI Lab to Evaluate Large Language Models and Advance Cognitive Science -- A Three-Cycle Action Design Science Study

Zhiye Jin, Yibai Li, K. D. Joshi, Xuefei, Deng et al.|March 13, 2026arXiv

Key Takeaway

LLM evaluation can be more rigorous by borrowing established methods from psychology and cognitive science—this platform shows how to systematically apply those methods at scale.

Summary

Researchers built PsyCogMetrics AI Lab, a cloud platform that applies psychology and cognitive science methods to evaluate large language models. The study uses a rigorous three-phase design process to identify evaluation gaps, develop theory-based assessment methods, and test them in practice.

evaluation reasoning

Key Terms

benchmark cognitive-load-theory classical-test-theory