publications | Xiong (Alan) Peng

2025

NeurIPS

Generative Model Inversion Through the Lens of the Manifold Hypothesis

Xiong Peng, Bo Han^†, Fengfei Yu, Tongliang Liu, Feng Liu, and Mingyuan Zhou

In the 39th Conference on Neural Information Processing Systems, 2025

Abs PDF Code

Model inversion attacks (MIAs) aim to reconstruct class-representative samples from trained models. Recent generative MIAs utilize generative adversarial networks to learn image priors that guide the inversion process, yielding reconstructions with high visual quality and strong fidelity to the private training data. To explore the reason behind their effectiveness, we begin by examining the gradients of inversion loss w.r.t. synthetic inputs, and find that these gradients are surprisingly noisy. Further analysis reveals that generative inversion implicitly denoises these gradients by projecting them onto the tangent space of the generator manifold—filtering out off-manifold components while preserving informative directions aligned with the manifold. Our empirical measurements show that, in models trained with standard supervision, loss gradients often exhibit large angular deviations from the data manifold, indicating poor alignment with class-relevant directions. This observation motivates our central hypothesis: models become more vulnerable to MIAs when their loss gradients align more closely with the generator manifold. We validate this hypothesis by designing a novel training objective that explicitly promotes such alignment. Building on this insight, we further introduce a training-free approach to enhance gradient–manifold alignment during inversion, leading to consistent improvements over state-of-the-art generative MIAs.
TPAMI

Unknown-Aware Bilateral Dependency Optimization for Defending Against Model Inversion Attacks

Xiong Peng, Feng Liu, Nannan Wang, Long Lan, Tongliang Liu, Yiu-ming Cheung, and Bo Han^†

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Abs PDF Code

By abusing access to a well-trained classifier, model inversion (MI) attacks pose a significant threat as they can recover the original training data, leading to privacy leakage. Previous studies mitigated MI attacks by imposing regularization to reduce the dependency between input features and outputs during classifier training, a strategy known as unilateral dependency optimization. However, this strategy contradicts the objective of minimizing the supervised classification loss, which inherently seeks to maximize the dependency between input features and outputs. Consequently, there is a trade-off between improving the model’s robustness against MI attacks and maintaining its classification performance. To address this issue, we propose the bilateral dependency optimization strategy (BiDO), a dual-objective approach that minimizes the dependency between input features and latent representations, while simultaneously maximizing the dependency between latent representations and labels. BiDO is remarkable for its privacy-preserving capabilities. However, models trained with BiDO exhibit diminished capabilities in out-of-distribution (OOD) detection compared to models trained with standard classification supervision. Given the open-world nature of deep learning systems, this limitation could lead to significant security risks, as encountering OOD inputs—whose label spaces do not overlap with the in-distribution (ID) data used during training-is inevitable. To address this, we leverage readily available auxiliary OOD data to enhance the OOD detection performance of models trained with BiDO. This leads to the introduction of an upgraded framework, unknown-aware BiDO (BiDO+), which mitigates both privacy and security concerns.

2024

arXiv

Model Inversion Attacks: A Survey of Approaches and Countermeasures

Zhanke Zhou^*, Jianing Zhu^*, Fengfei Yu^*, Xuan Li, Xiong Peng, Tongliang Liu, and Bo Han^†

In arXiv, 2024

Abs PDF

The success of deep neural networks has driven numerous research studies and applications from Euclidean to non-Euclidean data. However, there are increasing concerns about privacy leakage, as these networks rely on processing private data. Recently, a new type of privacy attack, the model inversion attacks (MIAs), aims to extract sensitive features of private data for training by abusing access to a well-trained model. The effectiveness of MIAs has been demonstrated in various domains, including images, texts, and graphs. These attacks highlight the vulnerability of neural networks and raise awareness about the risk of privacy leakage within the research community. Despite the significance, there is a lack of systematic studies that provide a comprehensive overview and deeper insights into MIAs across different domains. This survey aims to summarize up-to-date MIA methods in both attacks and defenses, highlighting their contributions and limitations, underlying modeling principles, optimization challenges, and future directions. We hope this survey bridges the gap in the literature and facilitates future research in this critical area. Besides, we are maintaining a repository to keep track of relevant research at https://github.com/AndrewZhou924/Awesome-model-inversion-attack.
NeurIPS

Pseudo-Private Data Guided Model Inversion Attacks

Xiong Peng, Bo Han^†, Feng Liu, Tongliang Liu, and Mingyuan Zhou

In the 38th Conference on Neural Information Processing Systems, 2024

Abs PDF Code

In model inversion attacks (MIAs), adversaries attempt to recover private training data by exploiting access to a well-trained target model. Recent advancements have improved MIA performance using a two-stage generative framework. This approach first employs a generative adversarial network to learn a fixed distributional prior, which is then used to guide the inversion process during the attack. However, in this paper, we observed a phenomenon that such a fixed prior would lead to a low probability of sampling actual private data during the inversion process due to the inherent distribution gap between the prior distribution and the private data distribution, thereby constraining attack performance. To address this limitation, we propose increasing the density around high-quality pseudo-private data—recovered samples through model inversion that exhibit characteristics of the private training data—by slightly tuning the generator. This strategy effectively increases the probability of sampling actual private data that is close to these pseudo-private data during the inversion process. After integrating our method, the generative model inversion pipeline is strengthened, leading to improvements over state-of-the-art MIAs. This paves the way for new research directions in generative MIAs.

2022

KDD

Bilateral dependency optimization: Defending against model-inversion attacks

Xiong Peng^*, Feng Liu^*, Jingfeng Zhang, Long Lan, Junjie Ye, Tongliang Liu, and Bo Han^†

In the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022

Abs PDF Code

Through using only a well-trained classifier, model-inversion (MI) attacks can recover the data used for training the classifier, leading to the privacy leakage of the training data. To defend against MI attacks, previous work utilizes a unilateral dependency optimization strategy, i.e., minimizing the dependency between inputs (i.e., features) and outputs (i.e., labels) during training the classifier. However, such a minimization process conflicts with minimizing the supervised loss that aims to maximize the dependency between inputs and outputs, causing an explicit trade-off between model robustness against MI attacks and model utility on classification tasks. In this paper, we aim to minimize the dependency between the latent representations and the inputs while maximizing the dependency between latent representations and the outputs, named a bilateral dependency optimization (BiDO) strategy. In particular, we use the dependency constraints as a universally applicable regularizer in addition to commonly used losses for deep neural networks (e.g., cross-entropy), which can be instantiated with appropriate dependency criteria according to different tasks. To verify the efficacy of our strategy, we propose two implementations of BiDO, by using two different dependency measures: BiDO with constrained covariance (BiDO-COCO) and BiDO with Hilbert-Schmidt Independence Criterion (BiDO-HSIC). Experiments show that BiDO achieves the state-of-the-art defense performance for a variety of datasets, classifiers, and MI attacks while suffering a minor classification-accuracy drop compared to the well-trained classifier with no defense, which lights up a novel road to defend against MI attacks.