Toward Scalable Automated Repository-Level Datasets for Software Vulnerability Detection

Amine Lbath|March 18, 2026arXiv

Key Takeaway

Automated vulnerability injection with proof-of-concept exploits can scale up realistic training datasets for repository-level security detection, moving beyond function-level benchmarks to test how AI handles real-world code complexity.

Summary

This research creates an automated system to generate large-scale datasets for training AI models to detect software vulnerabilities in real code repositories.

data safety agents

Key Terms

repository-level-reasoning vulnerability-detection proof-of-concept adversarial-co-evolution benchmark-dataset