Automated evaluation of RAG systems for news credibility assessment can reliably match human judgment, enabling faster iteration on trustworthiness...
This paper describes evaluation tools for AI systems that help readers assess whether news articles are trustworthy. Researchers created benchmarks with human-judged questions and reports about real news, then built an automated system to score new submissions without needing human reviewers each time.