Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning

Aozhe Wang, Yuchen Yan, Nan Zhou, Zhengxi Lu, Weiming Lu et al.|March 16, 2026arXiv

Key Takeaway

Separating code and test generation into competing models with opposing rewards prevents self-collusion and produces higher-quality code and tests than single-model self-play approaches.

Summary

Code-A1 uses two competing AI models to improve code generation: one model writes code, the other writes tests to find bugs in that code. By making them adversaries with opposite goals, the system avoids the problem where a single model could cheat by writing easy tests for itself. This approach generates better code and tests than training on human-written test suites alone.

training reasoning

Key Terms

reinforcement-learning code-generation adversarial-objectives unit-test self-play