对抗样本方向（Adversarial Examples）2018-2020年最新论文调研

最新推荐文章于 2023-01-10 21:24:34 发布

huitailangyz

最新推荐文章于 2023-01-10 21:24:34 发布

阅读量1.8w

点赞数 34

CC 4.0 BY-SA版权

分类专栏：论文阅读

本文链接：https://blog.csdn.net/huitailangyz/article/details/106763625

调研范围

2018NIPS、2019NIPS、2018ECCV、2019ICCV、2019CVPR、2020CVPR、2019ICML、2019ICLR、2020ICLR

2018NIPS

Contamination Attacks and Mitigation in Multi-Party Machine Learning（防御）

作者：Jamie Hayes(Univeristy College London) Olga Ohrimenko(Microsoft Research)

摘要：Machine learning is data hungry; the more data a model has access to in training, the more likely it is to perform well at inference time. Distinct parties may want to combine their local data to gain the benefits of a model trained on a large corpus of data. We consider such a case: parties get access to the model trained on their joint data but do not see each others individual datasets. We show that one needs to be careful when using this multi-party model since a potentially malicious party can taint the model by providing contaminated data. We then show how adversarial training can defend against such attacks by preventing the model from learning trends specific to individual parties data, thereby also guaranteeing party-level membership privacy.

不同方各自提供数据，彼此数据互相不可见，然后使用联合数据进行模型训练，使用对抗训练来防止某一方对数据进行污染攻击

A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks（防御）

作者：Kimin Lee, Kibok Lee, Honglak Lee, Jinwoo Shin(Korea Advanced Institute of Science and Technology (KAIST), University of Michigan, Google Brain)

摘要：Detecting test samples drawn sufficiently far away from the training distribution statistically or adversarially is a fundamental requirement for deploying a good classifier in many real-world machine learning applications. However, deep neural networks with the softmax classifier are known to produce highly overconfident posterior distributions even for such abnormal samples. In this paper, we propose a simple yet effective method for detecting any abnormal samples, which is applicable to any pre-trained softmax neural classifier. We obtain the class conditional Gaussian distributions with respect to (low- and upper-level) features of the deep models under Gaussian discriminant analysis, which result in a confidence score based on the Mahalanobis distance. While most prior methods have been evaluated for detecting either out-of-distribution or adversarial samples, but not both, the proposed method achieves the state-of-the-art performances for both cases in our experiments. Moreover, we found that our proposed method is more robust in harsh cases, e.g., when the training dataset has noisy labels or small number of samples. Finally, we show that the proposed method enjoys broader usage by applying it to class-incremental learning: whenever out-of-distribution samples are detected, our classification rule can incorporate new classes well without further training deep models.

在测试时检测异常数据，可以同时检测出训练数据分布外样本和对抗样本

Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples（人脸防御）

作者：Guanhong Tao, Shiqing Ma, Yingqi Liu, Xiangyu Zhang(Department of Computer Science, Purdue University)

摘要：Adversarial sample attacks perturb benign inputs to induce DNN misbehaviors. Recent research has demonstrated the widespread presence and the devastating consequences of such attacks. Existing defense techniques either assume prior knowledge of specific attacks or may not work well on complex models due to their underlying assumptions. We argue that adversarial sample attacks are deeply entangled with interpretability of DNN models: while classification results on benign inputs can be reasoned based on the human perceptible features/attributes, results on adversarial samples can hardly be explained. Therefore, we propose a novel adversarial sample detection technique for face recognition models, based on interpretability. It features a novel bi-directional correspondence inference between attributes and internal neurons to identify neurons critical for individual attributes. The activation values of critical neurons are enhanced to amplify the reasoning part of the computation and the values of other neurons are weakened to suppress the uninterpretable part. The classification results after such transformation are compared with those of the original model to detect adversaries. Results show that our technique can achieve 94% detection accuracy for 7 different kinds of attacks with 9.91% false positives on benign inputs. In contrast, a state-of-the-art feature squeezing technique can only achieve 55% accuracy with 23.3% false positives.

基于模型可解释性对人脸识别模型进行对抗样本检测

Robust Detection of Adversarial Attacks by Modeling the Intrinsic Properties of Deep Neural Networks（防御）

作者：Zhihao Zheng, Pengyu Hong(Department of Computer Science, Brandeis University)

摘要：It has been shown that deep neural network (DNN) based classifiers are vulnerable to human-imperceptive adversarial perturbations which can cause DNN classifiers to output wrong predictions with high confidence. We propose an unsupervised learning approach to detect adversarial inputs without any knowledge of attackers. Our approach tries to capture the intrinsic properties of a DNN classifier and uses them to detect adversarial inputs. The intrinsic properties used in this study are the output distributions of the hidden neurons in a DNN classifier presented with natural images. Our approach can be easily applied to any DNN classifiers or combined with other defense strategy to improve robustness. Experimental results show that our approach demonstrates state-of-the-art robustness in defending black-box and gray-box attacks.

用DNN的隐藏层输出特征的分布，来检测对抗输入样本，在黑盒和灰盒防御上SOTA

2019NIPS

Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training(防御)

作者：Haichao Zhang(Horizon Robotics) Jianyu Wang(Baidu Research)

摘要：We introduce a feature scattering-based adversarial training approach for improving model robustness against adversarial attacks. Conventional adversarial training approaches leverage a supervised scheme (either targeted or non-targeted) in generating attacks for training, which typically suffer from issues such as label leaking as noted in recent works. Differently, the proposed approach generates adversarial images for training through feature scattering in the latent space, which is unsupervised in nature and avoids label leaking. More importantly, this new approach generates perturbed images in a collaborative fashion, taking the inter-sample relationships into consideration. We conduct analysis on model robustness and demonstrate the effectiveness of the proposed approach through extensively experiments on different datasets compared with state-of-the-art approaches.

基于特征分布的对抗训练方法来模型对于对抗攻击的鲁棒性

Subspace Attack: Exploiting Promising Subspaces for Query-Efficient Black-box Attacks(探针黑盒攻击)

作者：Ziang Yan Yiwen Guo Changshui Zhang(Institute for Artificial Intelligence, Tsinghua University (THUAI))

摘要：Unlike the white-box counterparts that are widely studied and readily accessible, adversarial examples in black-box settings are generally more Herculean on account of the difficulty of estimating gradients. Many methods achieve the task by issuing numerous queries to target classification systems, which makes the whole procedure costly and suspicious to the systems. In this paper, we aim at reducing the query complexity of black-box attacks in this category. We propose to exploit gradients of a few reference models which arguably span some promising search subspaces. Experimental results show that, in comparison with the state-of-the-arts, our method can gain up to 2x and 4x reductions in the requisite mean and medium numbers of queries with much lower failure rates even if the reference models are trained on a small and inadequate dataset disjoint to the one for training the victim model. Code and models for reproducing our results will be made publicly available.

利用少数参考模型的梯度信息，扩展一些有用的搜索子空间，从而减少黑盒攻击时的查询次数

Functional Adversarial Attacks(攻击)

作者：Cassidy Laidlaw, Soheil Feizi(University of Maryland)

摘要：We propose functional adversarial attacks, a novel class of threat models for crafting adversarial examples to fool machine learning models. Unlike a standard lp-ball threat model, a functional adversarial threat model allows only a single function to be used to perturb input features to produce an adversarial example. For example, a functional adversarial attack applied on colors of an image can change all red pixels simultaneously to light red. Such global uniform changes in images can be less perceptible than perturbing pixels of the image individually. For simplicity, we refer to functional adversarial attacks on image colors as ReColorAdv, which is the main focus of our experiments. We show that functional threat models can be combined with existing additive (lp) threat models to generate stronger threat models that allow both small, individual perturbations and large, uniform changes to an input. Moreover, we prove that such combinations encompass perturbations that would not be allowed in either constituent threat model. In practice, ReColorAdv can significantly reduce the accuracy of a ResNet-32 trained on CIFAR-10. Furthermore, to the best of our knowledge, combining ReColorAdv with other attacks leads to the strongest existing attack even after adversarial training.

不同于普通的对每张图像单独进行攻击计算，该方法提出通过一个统一的函数对输入图像进行攻击计算，同时该方法可以和其他普通方法进行结合

Improving Black-box Adversarial Attacks with a Transfer-based Prior(黑盒攻击)

作者：Shuyu Cheng, Yinpeng Dong, Tianyu Pang, Hang Su, Jun Zhu(Dept. of Comp. Sci. and Tech., BNRist Center, State Key Lab for Intell. Tech. & Sys., Institute for AI, THBI Lab, Tsinghua University)

摘要：We consider the black-box adversarial setting, where the adversary has to generate adversarial perturbations without access to the target models to compute gradients. Previous methods tried to approximate the gradient either by using a transfer gradient of a surrogate white-box model, or based on the query feedback. However, these methods often suffer from low attack success rates or poor query efficiency since it is non-trivial to estimate the gradient in a high-dimensional space with limited information. To address these problems, we propose a prior-guided random gradient-free (P-RGF) method to improve black-box adversarial attacks, which takes the advantage of a transfer-based prior and the query information simultaneously. The transfer-based prior given by the gradient of a surrogate model is appropriately integrated into our algorithm by an optimal coefficient derived by a theoretical analysis. Extensive experiments demonstrate that our method requires much fewer queries to attack black-box models with higher success rates compared with the alternative state-of-the-art methods.

将替代模型的梯度作为先验，迁移到攻击模型上，本方法可以使用更少的查询次数来成功攻击黑盒模型

2018ECCV

Ask, Acquire, and Attack: Data-free UAP Generation using Class Impressions(通用攻击)

作者：Konda Reddy Mopuri, Phani Krishna Uppala, and R. Venkatesh Babu(Video Analytics Lab, Indian Institute of Science, Bangalore, India)

摘要：Deep learning models are susceptible to input specific noise, called adversarial perturbations. Moreover, there exist input-agnostic noise, called Universal Adversarial Perturbations (UAP) that can affect inference of the models over most input samples. Given a model, there exist broadly two approaches to craft UAPs: (i) data-driven: that require data, and (ii) data-free: that do not require data samples. Data-driven approaches require actual samples from the underlying data distribution and craft UAPs with high success (fooling) rate. However, data-free approaches craft UAPs without utilizing any data samples and therefore result in lesser success rates. In this paper, for data-free scenarios, we propose a novel approach that emulates the effect of data samples with class impressions in order to craft UAPs using data-driven objectives. Class impression for a given pair of category and model is a generic representation (in the input space) of the samples belonging to that category. Further, we present a neural network based generative model that utilizes the acquired class impressions to learn crafting UAPs. Experimental evaluation demonstrates that the learned generative model, (i) readily crafts UAPs via simple feed-forwarding through neural network layers, and (ii) achieves state-of-the-art success rates for data-free scenario and closer to that for data-driven setting without actually utilizing any data samples.

数据无关的通用攻击问题，通过“类别印象（给定某个类别和模型，类别印象为该类别样本的通用表示）”来模拟数据样本的效果

Practical Black-box Attacks on Deep Neural Networks using Efficient Query Mechanisms(黑盒攻击)

作者：Arjun Nitin Bhagoji(Princeton University), Warren He(University of California, Berkeley), Bo Li(University of Illinois at Urbana–Champaign), and Dawn Song

摘要：Existing black-box attacks on deep neural networks (DNNs)
have largely focused on transferability, where an adversarial instance generated for a locally trained model can “transfer” to attack other learning models. In this paper, we propose novel Gradient Estimation black-box attacks for adversaries with query access to the target model’s class probabilities, which do not rely on transferability. We also propose strategies to decouple the number of queries required to generate each adversarial sample from the dimensionality of the input. An iterative variant of our attack achieves close to 100% attack success rates for both targeted and untargeted attacks on DNNs. We carry out a thorough comparative evaluation of black-box attacks and show that Gradient Estimation attacks achieve attack success rates similar to state-of-the-art white-box attacks on the MNIST and CIFAR-10 datasets. We also apply the Gradient Estimation attacks successfully against real-world classifiers hosted by Clarifai. Further, we evaluate black-box attacks against state-of-theart defenses based on adversarial training and show that the Gradient Estimation attacks are very effective even against these defenses.

提出梯度估计的黑盒攻击方法，对目标模型的类别概率进行查询，可以同时完成特定类别攻击和非特定类别攻击

Improving DNN Robustness to Adversarial Attacks using Jacobian Regularization（防御）

作者：Daniel Jakubovitz and Raja Giryes(School of Electrical Engineering,
Tel Aviv University, Israel)

摘要：Deep neural networks have lately shown tremendous performance in various applications including vision and speech processing
tasks. However, alongside their ability to perform these tasks with such
high accuracy, it has been shown that they are highly susceptible to adversarial attacks: a small change in the input would cause the network
to err with high confidence. This phenomenon exposes an inherent fault
in these networks and their ability to generalize well. For this reason,
providing robustness to adversarial attacks is an important challenge in
networks training, which has led to extensive research. In this work, we
suggest a theoretically inspired novel approach to improve the networks’
robustness. Our method applies regularization using the Frobenius norm
of the Jacobian of the network, which is applied as post-processing, after
regular training has finished. We demonstrate empirically that it leads
to enhanced robustness results with a minimal change in the original
network’s accuracy.

使用网络雅克比矩阵的Frobenius范数（2范数）来提高网络鲁棒性，该方法在网络训练完成后对输入样本进行后处理

2019ICCV

Adversarial Defense via Learning to Generate Diverse Attacks(攻击)

作者：Yunseok Jang, Tianchen Zhao, Seunghoon Hong, Honglak Lee(University of Michigan)

摘要：With the remarkable success of deep learning, Deep Neural Networks (DNNs) have been applied as dominant tools to various machine learning domains. Despite this success, however, it has been found that DNNs are surprisingly vulnerable to malicious attacks; adding a small, perceptually indistinguishable perturbations to the data can easily degrade classification performance. Adversarial training is an effective defense strategy to train a robust classifier. In this work, we propose to utilize the generator to learn how to create adversarial examples. Unlike the existing approaches that create a one-shot perturbation by a deterministic generator, we propose a recursive and stochastic generator that produces much stronger and diverse perturbations that comprehensively reveal the vulnerability of the target classifier. Our experiment results on MNIST and CIFAR-10 datasets show that the classifier adversarially trained with our method yields more robust performance over various white-box and black-box attacks

使用生成器迭代和随机地生成对抗样本，从而完成白盒和黑盒的攻击

Sparse and Imperceivable Adversarial Attacks（攻击）

作者：Francesco Croce, Matthias Hein(University of Tubingen)

摘要：Neural networks have been proven to be vulnerable to a variety of adversarial attacks. From a safety perspective, highly sparse adversarial attacks are particularly dangerous. On the other hand the pixelwise perturbations of parse attacks are typically large and thus can be potentially detected. We propose a new black-box technique to craft adversarial examples aiming at minimizing l0-distance to the original image. Extensive experiments show that our attack is better or competitive to the state of the art.