SPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and Tables — ThinkLLM