Mask-based evaluation of image tampering is fundamentally flawed; pixel-level metrics with semantic understanding of edit types provide a much more accurate way to assess whether AI systems can detect real image manipulations.
This paper fixes how we evaluate image tampering detection by moving from coarse object masks to pixel-level precision. It introduces a taxonomy of edit types (replace, remove, splice, etc.), a new benchmark with precise tamper maps, and metrics that measure both where edits occur and what they mean semantically—revealing that existing detectors often miss subtle edits or flag untouched pixels.