Separating code and test generation into competing models with opposing rewards prevents self-collusion and produces higher-quality code and tests than single-model self-play approaches.
Code-A1 uses two competing AI models to improve code generation: one model writes code, the other writes tests to find bugs in that code. By making them adversaries with opposite goals, the system avoids the problem where a single model could cheat by writing easy tests for itself. This approach generates better code and tests than training on human-written test suites alone.