Current LLMs struggle with implicit user intentions and long-term preference modeling—they can handle immediate requests but fail to understand what users really need or remember their preferences over extended interactions.
LifeSim creates realistic simulated users with beliefs, desires, and intentions to test how well AI assistants handle long-term, multi-scenario interactions. The benchmark evaluates whether AI can understand both explicit requests and hidden user needs, maintain accurate user profiles over time, and provide contextually appropriate responses across 1,200 diverse life scenarios.