Training the Knowledge Base through Evidence Distillation and Write-Back Enrichment

Yuxing Lu, Xukai Zhao, Wei Wu, Jinzhuo Wang|March 26, 2026arXiv

Key Takeaway

You can improve RAG systems by preprocessing your corpus once to add distilled, compact versions of relevant documents—this works with any retrieval method and shows consistent gains without changing your pipeline.

Summary

This paper proposes WriteBack-RAG, a method that improves retrieval-augmented generation (RAG) systems by treating the knowledge base as trainable. Using labeled examples, the system identifies relevant documents, distills them into compact knowledge units, and adds these to the corpus.

data training

Key Terms

rag knowledge-distillation corpus knowledge-base