DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science — ThinkLLM