You can often remove an LLM's previous responses from conversation history without losing quality, saving memory while sometimes improving accuracy.
This paper tests whether LLMs actually need to see their own previous responses in multi-turn conversations. Surprisingly, removing past assistant responses often doesn't hurt quality and can shrink context by 10x. The researchers found that models sometimes get worse when they over-rely on their own prior outputs, introducing errors that compound across turns.