Evaluating Evidence Grounding Under User Pressure in Instruction-Tuned Language Models — ThinkLLM