[Seminar arxiv]fake face detection via adaptive residuals extraction network

Download as PPT, PDF

•0 likes•48 views

The document presents a study on a novel Adaptive Residuals Extraction Network (AREN) for detecting fake faces, which enhances image forensics through efficient preprocessing. It highlights the effectiveness of using DenseNet architecture to achieve better prediction residuals and improved accuracy in classifying different types of image manipulations. The study emphasizes the transferability of the AREN model to other CNN-based systems for detecting various image forgeries under complex scenarios.

Presentations & Public Speaking

Data-driven AI
Security HCI (DASH) Lab
1
Data-driven AI
Security HCI (DASH) Lab
Fake Face Detection via Adaptive Residuals
Extraction Network
김민하
성균관대학교
July 23, 2020
Data-driven AI
Security HCI (DASH) Lab
Zhiqing Guo, Hunan University
Gaobo Yang, Nanjing University of Information Science and Technology

Data-driven AI
Security HCI (DASH) Lab
Introduction
This paper proposed a simple but effective AREN module as pre-processing.

Data-driven AI
Security HCI (DASH) Lab
background
propose an adaptive residuals extraction network (AREN)
when inputting another FIM techniques,
transfer learning is not efficient because
the weight value is fixed

Data-driven AI
Security HCI (DASH) Lab
AREN NET Structure

Data-driven AI
Security HCI (DASH) Lab
DenseNet
The Key is
dense connectivity.
it prevents Gradient Vanishing and
enables efficient back propagation

Data-driven AI
Security HCI (DASH) Lab
Input image
Getting prediction residuals
prediction residuals

Data-driven AI
Security HCI (DASH) Lab
Finally, Freu is obtained.
composite function
Concatenation process using the DenseNet’s pattern

Data-driven AI
Security HCI (DASH) Lab
prediction residuals obtained by different ﬁlters
most image contents are suppressed
whereas keeping well manipulation traces
more suitable for image forensics, from the back-propagation pass.
AREN suppresses image content and obtains stable prediction residuals.

Data-driven AI
Security HCI (DASH) Lab
Hybrid fake face datasets

Data-driven AI
Security HCI (DASH) Lab
Confusion matrixes of Model-base and ARENnet
it was confirmed that Data type was better classified as the same Type

Data-driven AI
Security HCI (DASH) Lab
Image Operations For Generalization
Test a single Operations with different kernel sizes, as shown in the following table.

Data-driven AI
Security HCI (DASH) Lab
Identification Rate For Different Forensics Models
ARENnet achieves much better accuracies under
three scenarios, HQ, and LQ

Data-driven AI
Security HCI (DASH) Lab
Generalization
Though JP and ME are two distinct image operations to manipulate images,
it can still observe that the image operations with mixed parameters enable the detector
to learn more discriminative features, and thus improve the generalization ability.

Data-driven AI
Security HCI (DASH) Lab
Generalization
Image operation + JP or ME
whether the detector trained by the above method can
detect face images with other unknown operations?

Data-driven AI
Security HCI (DASH) Lab
The main works and contributions
• It can be transferred to the CNN-based models to detect other image
forgeries.
• First attempt towards the detection of multiple FIM techniques under
complex scenarios
• ARENnet achieves higher detection accuracy than existing works.

Data-driven AI
Security HCI (DASH) Lab
Thank you !

[Seminar arxiv]fake face detection via adaptive residuals extraction network

1. Data-driven AI Security HCI (DASH) Lab 1 Data-driven AI Security HCI (DASH) Lab Fake Face Detection via Adaptive Residuals Extraction Network 김민하 성균관대학교 July 23, 2020 Data-driven AI Security HCI (DASH) Lab Zhiqing Guo, Hunan University Gaobo Yang, Nanjing University of Information Science and Technology

2. Data-driven AI Security HCI (DASH) Lab Introduction This paper proposed a simple but effective AREN module as pre-processing.

3. Data-driven AI Security HCI (DASH) Lab background propose an adaptive residuals extraction network (AREN) when inputting another FIM techniques, transfer learning is not efficient because the weight value is fixed

4. Data-driven AI Security HCI (DASH) Lab AREN NET Structure

5. Data-driven AI Security HCI (DASH) Lab DenseNet The Key is dense connectivity. it prevents Gradient Vanishing and enables efficient back propagation

6. Data-driven AI Security HCI (DASH) Lab Input image Getting prediction residuals prediction residuals

7. Data-driven AI Security HCI (DASH) Lab Finally, Freu is obtained. composite function Concatenation process using the DenseNet’s pattern

8. Data-driven AI Security HCI (DASH) Lab prediction residuals obtained by different ﬁlters most image contents are suppressed whereas keeping well manipulation traces more suitable for image forensics, from the back-propagation pass. AREN suppresses image content and obtains stable prediction residuals.

9. Data-driven AI Security HCI (DASH) Lab Hybrid fake face datasets

10. Data-driven AI Security HCI (DASH) Lab Confusion matrixes of Model-base and ARENnet it was confirmed that Data type was better classified as the same Type

11. Data-driven AI Security HCI (DASH) Lab Image Operations For Generalization Test a single Operations with different kernel sizes, as shown in the following table.

12. Data-driven AI Security HCI (DASH) Lab Identification Rate For Different Forensics Models ARENnet achieves much better accuracies under three scenarios, HQ, and LQ

13. Data-driven AI Security HCI (DASH) Lab Generalization Though JP and ME are two distinct image operations to manipulate images, it can still observe that the image operations with mixed parameters enable the detector to learn more discriminative features, and thus improve the generalization ability.

14. Data-driven AI Security HCI (DASH) Lab Generalization Image operation + JP or ME whether the detector trained by the above method can detect face images with other unknown operations?

15. Data-driven AI Security HCI (DASH) Lab The main works and contributions • It can be transferred to the CNN-based models to detect other image forgeries. • First attempt towards the detection of multiple FIM techniques under complex scenarios • ARENnet achieves higher detection accuracy than existing works.

16. Data-driven AI Security HCI (DASH) Lab Thank you !

17. Data-driven AI Security HCI (DASH) Lab

Editor's Notes

#2: Hellw, I’m glad to be here with you today. (Let me start off by brifly introducing myself first. My name is minha kim. I am going to transfer from Hanyang University to the Department of Software at Sungkyunkwan University..) What I’m going to talk about is Fake Face Detection via Adaptive Residuals Extraction NetworkThis paper is registered in the archive.
#3: Let me explain the introduction The latest AI-enhanced fake face images can achieve realistic visual qualities, which are quite challenging to be detected. this paper addressed fake face image detection under complex scenarios. Due to the relatively ﬁxed structure, there are some limitations for the existing CNNbased works. This paper proposed a simple yet effective AREN module as pre-processing.
#4: Previously, methods of classifying and detecting deep fake face have been introduced a lot. However, when inputting another FIM techniques, transfer learning is not efficient because the weight value is fixed Also, Due to the relatively ﬁxed structure, (CNN) tends to learn image content representations. However, CNN should learn subtle tampering artifacts for image forensics tasks. This paper propose an adaptive residuals extraction network (AREN), AREN exploits an adaptive convolution layer to predict image residuals, which are reused in subsequent layers to maximize manipulation artifacts by updating weights during the back-propagation pass. ----------------- another FIM techniques face image manipulation (FIM) techniques such as Face2Face and Deepfake AREN? AREN exploits an adaptive convolution layer to predict image residuals, which are reused in subsequent layers to maximize manipulation artifacts by updating weights during the back-propagation pass.
#5: This is AREN NET‘s Structure. AREN exploits the convolution layer to serve as a predictor to obtain image residuals. The weights are updated adaptively during the back-propagation pass. In subsequent layers, the prediction residuals are reused to maximize manipulation traces. They also designed a fake face detector, namely ARENnet, by integrating AREN with CNN. -------------------------------- How to ued when back propagation?
#6: Before I explain AREN, I will briefly explain DENSE NET.The key to DENSE NET is dense connectivity.DENSE CONNECTIVITY is a series of combinations of input values with channels in the output value.In other words, DENSENET perform the concatenation. The characteristic of this DENSET is that the input channels are rich.Therefore, it prevents GRADIENT VANISHING and enables efficient BACK-PROPAGATION To obtain stable prediction residuals, AREN borrow the idea of feature reusing by DenseNet. the residuals Fres extracted by the Conv 1 layer are fragile. If they are used directly, it might still lead to unstable training. So in the next chapter, I'm going to explain the sequence in which They apply the ideas of the densenet to the Fres and finally extract the Freu. ---------------------------------- *DenseNet’s weekness The deeper the depth, the greater the computation.
#7: This is the expression to get prediction residuals, that is, Fres. the prediction residuals are used as low-level features to construct high-level features for image forensics. AREN is specifically designed to automatically learn prediction residuals. -------------------------------------- Conv1의 weight,coefficients의 값은?? The Initial Weight is random The Initial coefficients of the Conv 1 layer are randomly set F가 정확히 뭐냐? where Fj is the jth feature map which is output by the Conv 1 layer,
#8: This is the concentration process using the DENSENET’s pattern Let c2 and c3 be two convolution layers, and Hc2,c3(·) denote the composite function of c2 and c3. Thus, the residuals Fres is passed into Conv 2 and Conv 3 to obtain the intermediate feature map . IF2 advances CONCATENATION in the same way,Finally, Freu is obtained.
#9: Then, the weights are updated by an iterative algorithm during the back-propagation pass. In this paper, stochastic gradient descent (SGD) is used to train the model. The rules for iterative updates are defined as follows. Note that different from the fixed predictor in existing works AREN can adaptively learn prediction residuals, which are more suitable for image forensics, via the back-propagation pass. L (k) i is the true label of the i th image in the k th class, y (k) i is the network output, x is the number of training sample, and n is the number of neurons in the output layer ---------------------------------------------- Initial value?? nabla w denotes the gradient, ω(n) ij is the weight of the ith channel of the jth convolutional kernel in the nth layer Momentum θ1 = 0.95 and the decay θ2 = 0.005 εb = 0.001, γ = 0.5, N = 1000. batch size is set to 64. Each training epoch requires 1,817 iterations. SGD? SGD is used for iterative optimization Momentum?
#10: AREN suppresses image content and obtains stable prediction residuals. The figure compares the prediction residuals obtained by different ﬁlters after 100 and 10,000 times of iterations. We can observe that when AREN iterates 100 times, most image contents are not suppressed. However, when the iteration times reach 10,000(ten thousand), most image contents are suppressed whereas keeping well manipulation traces. different from the ﬁxed predictor in existing works, AREN can adaptively learn prediction residuals, which are more suitable for image forensics, from the back-propagation pass.
#11: Finally, the learned deep features are passed into the classiﬁcation module, which is made up of three fully connected layers. The ﬁrst two fully connected layers, which learn the associations among deep features, have 300 neurons, respectively. The neurons in the last fully connected layer, whose outputs correspond to the real face image and possible face image manipulations. --------------------------------------------------- Why are strides 2 size?(small) For the convolutional kernels, small stride can extracts more abundant features than large stride Why apply Conv 9 layer 1x1 ? . It learns the linear combination of those features located in the same location but different channels. 결과값이 어떻게 나오나?
#12: To conduct the above experiments, they ﬁrstly build a hybrid fake face (HFF) dataset, which contains eight types of face images . For real face images, three types of face images are randomly selected from three open datasets. They are low resolution face images from CelebA [79], high-resolution face images from CelebA-HQ [10], and face video frames from FaceForensics [52], respectively. Thus, real face images under internet scenarios are simulated as real as possible.
#13: The confusion matrixes of Model-base and ARENnet are reported in two tables, respectively.when ARENNET is used, you can see that the Predicted value has a high detection rate after training various classes. In other words, each data type can be detected with high accuracy in AREN MODEL, which has learned eight type data sets When ARENNET was used, it was confirmed that DATA TYPE was better classified as the same TYPE.
#14: Actually, AREN is very flexible with the following issues to be further investigated by experiments: First, the residual features Fres is better for forensics than image data itself. Second, reusing Fres improve the ARENnet Third, when Kernels’s size is 6,It is appropriate for Conv 4 and Conv 5. Fourth, convolutional kernels in the first layer, the size of 3×3 is better than the size of 5×5 five. Third, it is a nice choice to use two convolution layers between two concatenation operations. The experimental results prove that two convolution layers are more stable than one convolution layer. six, Max pooling is more preferable than average pooling for the ARENnet, Finally, the 1×1 convolutional kernel in the Conv 9 layer improves higher detection accuracy than the 3×3 convolutional kernels, which benefits from the cross-channel interaction and information integration. ------------------------------- RER? Relative error reduction
#15: In the paper, the following operations were applied to test the generalization ability of the model.And Test a single Operations with different kernel sizes, as shown in the following table. --------------------------------- Mean Filtering will be shortened to ME, Gaussian Filtering to GB, Median Filtering is MED, Gamma Correction to GC, JPEG Compression to JP, JPEG Compression 2000 to JP2, Scaling to SC.
#16: This table reports the detection accuracies under three scenarios. It’s observed that for most detectors, both JP60 and ME5 greatly degrades the detection accuracies. However, ARENnet achieves much better accuracies under three scenarios. In addition, ARENnet also show higher accuracy than other methods in High Quality and Low Quality. ------------------------------- HFF? hybrid fake face Raw ? original face images
#17: Table reports the confusion matrixes when ARENnet are testing ﬁve types of face images. Furthermore, though JP and ME are two distinct image operations to manipulate images, it can still observe that the image operations with mixed parameters enable the detector to learn more discriminative features, and thus improve the generalization ability. --------------------- Me5가 뭐냐? It means mean filtering 5x5
#18: But For the generalization capability, there is still a question left: whether the detector trained by the above method can detect face images with other unknown operations? To verify this, the trained ARENnet is tested to detect some other types of face images, such as GB-mix, MED-mix, GC-mix, JP2mix, and SC-mix. The experimental results are reported in Table XII. The average accuracy is 95.17%. That is, ARENnet achieves desirable generalization capability, especially when it is trained on the large dataset. This proves that training the detector with those face images after image operations with mixed parameters is an effective strategy to enhance detection robustness, since the detector can learn more discriminative features from them. -------------------------------------- Mix가 어떤것이냐? : this paper select two representative image operations, namely JP and ME, for experiments. Image operation plus ‘JP’ or ‘ME’ is expressed –mix Samll, middle, large? 124k, 165k, and 372k
#19: The main works and contributions are summarized as follows. • Different from the fixed predictors in existing works, AREN predicts residuals adaptively during back-propagation. Thus, AREN can provide more discriminative residuals for image forensics tasks. AREN might serve as the basic residual predictor, which means that it can be transferred to the CNN-based models to detect other image forgeries. • ARENnet is the first attempt towards the detection of multiple FIM techniques under complex scenarios •they simulate the complex scenarios of practical face image forensics as real as possible. A series of experiments are conducted to prove the effectiveness of the proposed ARENnet. ARENnet achieves higher detection accuracy than existing works. In addition, It also explore the way to improve the generalization ability of the detector. --------------------------- What is the meaning of the complex scenario? It means situation that input the noise such as FIM technoligy, or mean filtering or jpeg compression 왜 transfer이 가능한가? 이미지 자체를 학습시키는 것이 아닌 이미지의 residuals을 학습시키는 것이기 때문에 data type에 크게 영향을 받지 않는다. This is because it is not learning the image itself, but learning the residuals of the image.
#20: 이 논문의 장점? 단점? 1.굳이 더좋은 optimizer을 두고 sgd를 썼어야 했나? Why did you use sgd instead of using a better, state-of-the-art optimizer? 2.문제는 혼합 매개변수를 시험했을 때 일반화 능력이 증가했을 것인가 하는 것이었다. Second, My question is that when the mixed parameters were tested,, whether the ability to generalize would have been increased FC보다 Global average pooling썻으면 ? ----------------------- RELU relu6 means to set the upper limit to 6 in the existing relu. When optimizing deep learning model performance, you may need to convert it to a pixed point. if you set the upper limit to 6, it only needs 3 bits at the maximum, which is helpful from an optimization point of view.
#21: Convolutional Deep Belief Networks on CIFAR-10 논문에 따르면 ReLU에 상한선을 두게 되면 딥러닝 모델이 학습 할 때, sparse한 feature를 더 일찍 학습할 수 있게 된다는 이유로 상한선을 두었고 여러가지 테스트를 통해 확인해 보았을 때, 6을 상한선으로 둔 것이 성능에 좋았기 때문에 ReLU6를 사용했다는 것입니다.

[Seminar arxiv]fake face detection via adaptive residuals extraction network

More Related Content

What's hot (20)

Similar to [Seminar arxiv]fake face detection via adaptive residuals extraction network (7)

More from KIMMINHA3 (11)

Recently uploaded (20)

[Seminar arxiv]fake face detection via adaptive residuals extraction network

Editor's Notes