MemDLM: Memory-Enhanced DLM Training

Zehua Pei, Hui-Ling Zhen, Weizhe Lin, Sinno Jialin Pan, Yunhe Wang et al.|March 23, 2026arXiv

Key Takeaway

Diffusion language models can be trained more effectively by embedding a simulated denoising trajectory into training, and this memory mechanism can be reused at inference time to improve long-context retrieval tasks.

Summary

This paper addresses a key problem in diffusion language models: they're trained one way (predicting masked tokens) but used differently (multi-step denoising). MemDLM fixes this mismatch by simulating the denoising process during training using a memory mechanism that learns from each sample's trajectory, leading to faster training and better long-context performance.

training architecture efficiency

Key Terms

diffusion-language-models train-inference-mismatch bi-level-optimization parametric-memory in-weight-retrieval