SPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and Tables

Sungho Park, Jueun Kim, Wook-Shin Han|February 26, 2026arXiv

Key Takeaway

Current AI models struggle with real-world table-text reasoning; SPARTA exposes this gap with automatically-generated, complex multi-hop questions ...

Summary

SPARTA is a benchmark for testing AI models on complex questions that require reasoning across both text and tables together.

evaluation reasoning multimodal

Key Terms

multi-hop-reasoning table-text-qa semantic-parsing aggregation