ICLR 2025: Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images

Jonathan Brokman · Amit Giloni · Omer Hofman · Roman Vainshtein · Hisashi Kojima · Guy Gilboa, ICLR 2025

Abatract:

Distinguishing between real and AI-generated images, commonly referred to as ‘image detection’, presents a timely and significant challenge. Despite extensive research in the (semi-)supervised regime, zero-shot and few-shot solutions have only recently emerged as promising alternatives. Their main advantage is in alleviating the ongoing data maintenance, which quickly becomes outdated due to advances in generative technologies. We identify two main gaps: (1) a lack of theoretical grounding for the methods, and (2) significant room for performance improvements in zero-shot and few-shot regimes. Our approach is founded on understanding and quantifying the biases inherent in generated content, where we use these quantities as criteria for characterizing generated images. Specifically, we explore the biases of the implicit probability manifold, captured by a pre-trained diffusion model. Through score-function analysis, we approximate the curvature, gradient, and bias towards points on the probability manifold, establishing criteria for detection in the zero-shot regime. We further extend our contribution to the few-shot setting by employing a mixture-of-experts methodology. Empirical results across 20 generative models demonstrate that our method outperforms current approaches in both zero-shot and few-shot settings. This work advances the theoretical understanding and practical usage of generated content biases through the lens of manifold analysis.

camera ready version

 

Code

 

SSVM 2025: Identifying Memorization of Diffusion Models through p-Laplace Analysis

Jonathan Brokman, Amit Giloni, Omer Hofman, Roman Vainshtein, Hisashi Kojima, and Guy Gilboa, Int. Conf. on Scale Space and Variational Methods, 2025

Abstract:

Diffusion models, today’s leading image generative models, estimate the score function, i.e. the gradient of the log probability of (perturbed) data samples, without direct access to the underlying probability distribution. This work investigates whether the estimated score function can be leveraged to compute higher-order differentials, namely p-Laplace operators. We show here these operators can be employed to identify memorized training data. We propose a numerical p-Laplace approximation based on the learned score functions, showing its effectiveness in identifying key features of the probability landscape. We analyze the structured case of Gaussian mixture models, and demonstrate the results carry-over to image generative models, where memorization identification based on the p-Laplace operator is performed for the first time.

Adaptive Anisotropic Total Variation – Analysis and Experimental Findings of Nonlinear Spectral Properties

J. of Mathematical Imaging and Vision (JMIV), Vol. 64, pp. 916–938, 2022.

Shai Biton and Guy Gilboa

pdf

Springer link

Abstract

Our aim is to explain and characterize the behavior of adaptive total-variation (TV) regularization. TV has been widely used as an edge-preserving regularizer. However, objects are often over-regularized by TV, becoming blob-like convex structures of low curvature. This phenomenon was explained mathematically in the analysis of Andreau et al. They have shown that a TV regularizer can spatially preserve perfectly sets which are nonlinear eigenfunctions of the form $\lambda u \in \partial J_{TV}(u)$, where $\partial J_{TV}(u)$ is the TV subdifferential. For TV, these shapes are indeed convex sets of low-curvature.
A compelling approach to better preserve structures is to use adaptive anisotropic functionals, which adapt the regularization in an image-driven manner, with strong regularization along edges and low across them.
This follows the seminal work of Weickert on anisotropic diffusion. Adaptive anisotropic TV (A$^2$TV) was successfully used in several studies in the past decade. However, there is little analysis of the type of structures which can be well preserved. In this study we address this question by a joint methodology of mathematical derivations and experiments.

We rely on a recently developed theory of Burger et al on nonlinear spectral analysis of one-homogeneous functionals. We have that eigenfunction sets, admitting $\lambda u \in \partial J_{A^2TV}(u)$, are perfectly preserved under A$^2$TV-flow or minimization with $L^2$ square fidelity. We thus investigate these eigenfunctions theoretically and numerically. We prove non-convex sets can be eigenfunctions in certain conditions and provide numerical results which characterize well the relations between the degree of local anisotropy of the functional and the admitted maximal curvature. A nonlinear spectral representation is formulated, where shapes are well preserved and can be manipulated effectively. Finally, examples of possible applications related to shape manipulation and guided regularization of medical and depth data are shown.

NeurIPS 2020: Deeply Learned Spectral Total Variation Decomposition

Tamara G. Grossmann, Yury Korolev, Guy Gilboa, Carola-Bibiane Schönlieb, arXiv 2020

Accepted for NeurIPS 2020.

Non-linear spectral decompositions of images based on one-homogeneous functionals such as total variation have gained considerable attention in the last few years. Due to their ability to extract spectral components corresponding to objects of different size and contrast, such decompositions enable filtering, feature transfer, image fusion and other applications. However, obtaining this decomposition involves solving multiple non-smooth optimisation problems and is therefore computationally highly intensive. In this paper, we present a neural network approximation of a non-linear spectral decomposition. We report up to four orders of magnitude (×10,000) speedup in processing of mega-pixel size images, compared to classical GPU implementations. Our proposed network, TVSpecNET, is able to implicitly learn the underlying PDE and, despite being entirely data driven, inherits invariances of the model based transform. To the best of our knowledge, this is the first approach towards learning a non-linear spectral decomposition of images. Not only do we gain a staggering computational advantage, but this approach can also be seen as a step towards studying neural networks that can decompose an image into spectral components defined by a user rather than a handcrafted functional.