ECCV 2026: A Classifier-Agnostic Zero-Shot Adversarial Attack Detection via CLIP

Summary

A very general detector of attacks which is independent of architecture, classifier, dataset and attack method, based on CLIP phrases which discover very small, imperceptible, image perturbations.

Hodaya Krakover, Meir Yossef Levi, Eyal Gofer and Guy Gilboa

ECCV 2026.

Abstract

Adversarial attacks pose a challenge to the reliability of deep learning models, motivating effective detection methods. Existing techniques often rely on attack-specific assumptions, access to adversarial samples, or knowledge of the underlying classifier (white-box). We propose $A^4D$ (Attack- and Architecture-Agnostic Adversarial Detector), a completely black-box, zero-shot adversarial attack detection framework that utilizes prompt-based similarity scores derived from CLIP. To the best of our knowledge this is the first attempt to utilize CLIP for such a task.
The method is based on two key observations: (i) CLIP is sensitive even to small imperceptible non-semantic perturbations; (ii) The shift in CLIP embedding space is not arbitrary and can be used as a robust attack indicator.
Experiments across multiple attacks, datasets and classifiers validate that $A^4D$ achieves SOTA detection results in the attack-agnostic and classifier-agnostic setting.

paper