SlideShare a Scribd company logo
2020/09/08
Ho Seong Lee (hoya012)
Cognex Deep Learning Lab
Research Engineer
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 1
Contents
• Introduction
• Related Work
• New Benchmarks
• DeepAugment
• Experiments
• Conclusion
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 2
Introduction
Human vision system is robust, but existing vision models are not robust.
• Humans can deal with many forms of corruption such as blur, pixel noise and abstract change in
structure and style.
• Achieving robustness is essential in safety-critical and accuracy-critical applications.
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 3
Dog! Dog!Dog! Dog!
Dog! starfish! baseball! drumstick!
Introduction
Most work on robustness methods for vision has focused on adversarial examples.
• For standardizing and expanding the robustness topic, several studies have begun.
• At first, establish benchmark dataset and evaluation metric.
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 4
Related Works
Pioneers of robustness in ML - Dan Hendrycks (My Favorite Researcher..)
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 5
Reference: https://people.eecs.berkeley.edu/~hendrycks/
Related Works
Pioneers of robustness in ML – Madry Lab
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 6
Reference: http://madry-lab.ml/
Related Works
Pioneers of robustness in ML – Bethge Lab
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 7
Reference: http://bethgelab.org/
Related Works
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 8
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
• Create ImageNet-C and ImageNet-P test set and Benchmarks.
ImageNet-C ImageNet-P
Related Works
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 9
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
• Robustness Metrics
= 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 ( )
* each level of severity s (1 ≤ s ≤ 5).
ImageNet-C
ImageNet-P
Related Works
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 10
Natural Adversarial Examples
• Introduce natural adversarial examples and create 7,500 ImageNet-A test set. (200 class)
• Download numerous images related to an ImageNet class from website iNaturalist and Flickr.
• Delete the images that ResNet-50 correctly classify. Finally, select a subset of high-quality images.
ImageNet-A
Related Works
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 11
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
• Propose a technique to improve the robustness and uncertainty estimates of image classifiers.
• Use JS Divergence Consistency loss
New Benchmarks
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 12
Seven robustness hypotheses
• Larger Models: increasing model size improves robustness.
• Self-Attention: adding self-attention layers to models improves robustness.
• Diverse Data Augmentation: robustness can increase through data augmentation.
• Pretraining: pretraining on larger and more diverse datasets improves robustness.
• Texture Bias: convolutional networks are biased towards texture, which harms robustness.
• Only IID Accuracy Matters: accuracy on independent and identically distributed test data entirely
determines natural robustness.
• Synthetic ≠Natural: synthetic robustness interventions including diverse data augmentations do not help
with robustness on naturally occurring distribution shifts.
New Benchmarks
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 13
Introduce three new robustness benchmarks
• It has been difficult to arbitrate these hypotheses because existing robustness datasets preclude the
possibility of controlled experiments by varying multiple aspects simultaneously.
• To address these issues and test the seven hypotheses outlined above, create new test sets.
New Benchmarks
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 14
ImageNet-Renditions (ImageNet-R)
• 30,000 test set containing various renditions (e.g., paintings, embroidery, etc.) of ImageNet object
classes (200). The rendition styles (“Painting”, “Toy”) are not ImageNet-R’s classes.
• Original ImageNet dataset discouraged such images since annotators were instructed to collect “photos
only, no painting, no drawings, etc.” (Deng, 2012). Authors do the opposite.
New Benchmarks
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 15
StreetView StoreFronts (SVSF)
• Contains business storefront images taken from Google Streetview (20 classes).
• Investigate natural shifts in the image capture process using metadata (e.g. location, year, camera type)
• Create one training set(200K) and five in-distribution test set(10K) from images taken in
US/Mexico/Canada during 2019 using “new” camera. Unfortunately, unleased..
• Make four out-of-distribution test set(10K): “2017”, “2018”, “France”, “old camera”
New Benchmarks
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 16
DeepFashion Remixed
• Changes in camera operation can cause shifts in attributes such as object size, object occlusion, camera
viewpoint, and camera zoom.
• To measure this, create multi-labeled training set(48K) and 8 out-of-distribution test set (Total 121K).
Medium scale, Medium occlusion,
side/back viewpoint, no zoom-in
Small and large scale,
Minimal and heavy occlusion,
Frontal and not-worn viewpoints,
Medium and large zoom-in
DeepAugment
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 17
New data augmentation technique: DeepAugment
• In order to explore the Diverse Data Augmentation hypothesis, introduce a new data augmentation.
• Pass an image through an img-to-img networks (such as autoencoder or SR network)
• But rather than processing the image normally, distort the internal weights and activations by applying
randomly sampled ops(zeroing, negating, convolving, transposing), applying activation functions on.
• This creates diverse but semantically consistent images.
DeepAugment
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 18
Experiments
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 19
Experimental Setup
Experiments
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 20
Experimental Results: ImageNet-R
• ImageNet-200: The original ImageNet test set restricted to ImageNet-R’s 200 classes.
• Pretraining → improves IID/OOD gap with very small portion.
• Self-Attention → increase IID/OOD gap
• Diverse Data Augmentation, Larger Models → improves IID/OOD gap significantly!
OODIID
Diff
-0.2
0.0
0.2
-12.5
-2.0
-6.4
-7.1
-5.7
-10.8
-4.1
Error Rate (↓)
Experiments
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 21
Experimental Results: StreetView StoreFronts
• No method helps much on country shift, where error rates roughly double across the board.
• Images captured in France contain noticeably different architecture styles and storefront designs.
• Unable to find conspicuous and consistent indicators of the camera and year. → insensitive feature.
• Data augmentation primarily helps combat texture bias as with ImageNet-R.
• But existing augmentations are not diverse enough to capture high-level semantic shifts such as building
architecture. Error Rate (↓)
Experiments
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 22
Experimental Results: DeepFashion Remixed
• All evaluated methods have an average OOD mAP that is close to the baseline.
• DFR’s size and occlusion shifts hurt performance the most.
• Nothing substantially improved OOD performance beyond what is explained by IID performance, so here
it would appear that Only IID Accuracy Matters.
mAP scores (↑)
Experiments
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 23
Experimental Results: ImageNet-C
• DeepAugment + AugMix → attain the SOTA result.
• Evidence for Larger Models, Self-Attention, Diverse Data Augmentation, Pretraining, and Texture Bias.
• Evidence against the Only IID Accuracy Matters.
Error Rate, mCE (↓)
Experiments
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 24
Experimental Results: Real Blurry Images
• ImageNet-C use various synthetic corruptions that have different from real-world.
• Collect a small dataset of 1,000 real-world blurry images and evaluate various models.
• Everything that helped in ImageNet-C was also helpful in Real Blurry Images.
• Evidence against the Synthetic ≠Natural.
Error Rate (↓)
Conclusion
SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
• Introduce three new benchmarks, ImageNet-R, SVSF, and DFR. (+ Real Blurry Images)
• Introduce new data augmentation technique DeepAugment.
• With these benchmarks, evaluate seven robustness hypotheses.
• It seems that robustness has a many faces (multivariate).
• If so, research community should prioritize creating new robustness methods.
25

More Related Content

PDF
PR243: Designing Network Design Spaces
Jinwon Lee
 
PPTX
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Jia-Bin Huang
 
PDF
Robustness of Deep Neural Networks
khalooei
 
PDF
Deep residual learning for image recognition
Yoonho Shin
 
PDF
Deformable Convolutional Network (2017)
Terry Taewoong Um
 
PPTX
You only look once
Gin Kyeng Lee
 
PPT
Intro to Deep learning - Autoencoders
Akash Goel
 
PPTX
CONVOLUTIONAL NEURAL NETWORK
Md Rajib Bhuiyan
 
PR243: Designing Network Design Spaces
Jinwon Lee
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Jia-Bin Huang
 
Robustness of Deep Neural Networks
khalooei
 
Deep residual learning for image recognition
Yoonho Shin
 
Deformable Convolutional Network (2017)
Terry Taewoong Um
 
You only look once
Gin Kyeng Lee
 
Intro to Deep learning - Autoencoders
Akash Goel
 
CONVOLUTIONAL NEURAL NETWORK
Md Rajib Bhuiyan
 

What's hot (20)

PDF
Zero shot-learning: paper presentation
Jérémie Kalfon
 
PDF
Regularization in deep learning
VARUN KUMAR
 
PDF
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
Deep Learning JP
 
PPTX
[DL輪読会]Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Deep Learning JP
 
PPTX
Bayes learning
Musa Hawamdah
 
PPTX
Neural net and back propagation
Mohit Shrivastava
 
PDF
Introduction to Recurrent Neural Network
Yan Xu
 
PDF
[DL輪読会]A Higher-Dimensional Representation for Topologically Varying Neural R...
Deep Learning JP
 
PDF
[DL輪読会]EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning
Deep Learning JP
 
PPTX
Machine Learning - Convolutional Neural Network
Richard Kuo
 
PDF
文献紹介:VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Toru Tamaki
 
PPTX
Resnet for image processing (3)
devikarb
 
PDF
Generating Diverse High-Fidelity Images with VQ-VAE-2
harmonylab
 
PDF
Convolutional neural network
Yan Xu
 
PDF
論文紹介:Deep Mutual Learning
Toru Tamaki
 
PDF
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Jinwon Lee
 
PPTX
Ensemble methods
zekeLabs Technologies
 
PPTX
cnn ppt.pptx
rohithprabhas1
 
PDF
Deep Learning - Convolutional Neural Networks
Christian Perone
 
PPTX
Convolutional neural network from VGG to DenseNet
SungminYou
 
Zero shot-learning: paper presentation
Jérémie Kalfon
 
Regularization in deep learning
VARUN KUMAR
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
Deep Learning JP
 
[DL輪読会]Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Deep Learning JP
 
Bayes learning
Musa Hawamdah
 
Neural net and back propagation
Mohit Shrivastava
 
Introduction to Recurrent Neural Network
Yan Xu
 
[DL輪読会]A Higher-Dimensional Representation for Topologically Varying Neural R...
Deep Learning JP
 
[DL輪読会]EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning
Deep Learning JP
 
Machine Learning - Convolutional Neural Network
Richard Kuo
 
文献紹介:VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Toru Tamaki
 
Resnet for image processing (3)
devikarb
 
Generating Diverse High-Fidelity Images with VQ-VAE-2
harmonylab
 
Convolutional neural network
Yan Xu
 
論文紹介:Deep Mutual Learning
Toru Tamaki
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Jinwon Lee
 
Ensemble methods
zekeLabs Technologies
 
cnn ppt.pptx
rohithprabhas1
 
Deep Learning - Convolutional Neural Networks
Christian Perone
 
Convolutional neural network from VGG to DenseNet
SungminYou
 
Ad

Similar to "The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization" Paper Review (8)

PDF
“Tools and Strategies for Quickly Building Effective Image Datasets,” a Prese...
Edge AI and Vision Alliance
 
PDF
do adversarially robust image net models transfer better
LEE HOSEONG
 
PDF
Unsupervised region of interest
csandit
 
PDF
Lausanne 2019 #2
Arthur Charpentier
 
PPTX
riken-RBlur-slides.pptx
MuhammadAhmedShah2
 
PDF
Robustness of compressed CNNs
Kaushalya Madhawa
 
PPTX
HiPEAC2022_António Casimiro presentation
VEDLIoT Project
 
PDF
How useful is self-supervised pretraining for Visual tasks?
Seunghyun Hwang
 
“Tools and Strategies for Quickly Building Effective Image Datasets,” a Prese...
Edge AI and Vision Alliance
 
do adversarially robust image net models transfer better
LEE HOSEONG
 
Unsupervised region of interest
csandit
 
Lausanne 2019 #2
Arthur Charpentier
 
riken-RBlur-slides.pptx
MuhammadAhmedShah2
 
Robustness of compressed CNNs
Kaushalya Madhawa
 
HiPEAC2022_António Casimiro presentation
VEDLIoT Project
 
How useful is self-supervised pretraining for Visual tasks?
Seunghyun Hwang
 
Ad

More from LEE HOSEONG (20)

PDF
Unsupervised anomaly detection using style distillation
LEE HOSEONG
 
PDF
CNN Architecture A to Z
LEE HOSEONG
 
PDF
carrier of_tricks_for_image_classification
LEE HOSEONG
 
PDF
Mixed Precision Training Review
LEE HOSEONG
 
PDF
MVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
LEE HOSEONG
 
PDF
YOLOv4: optimal speed and accuracy of object detection review
LEE HOSEONG
 
PDF
FixMatch:simplifying semi supervised learning with consistency and confidence
LEE HOSEONG
 
PDF
"Revisiting self supervised visual representation learning" Paper Review
LEE HOSEONG
 
PDF
Unsupervised visual representation learning overview: Toward Self-Supervision
LEE HOSEONG
 
PDF
Human uncertainty makes classification more robust, ICCV 2019 Review
LEE HOSEONG
 
PDF
Single Image Super Resolution Overview
LEE HOSEONG
 
PDF
2019 ICLR Best Paper Review
LEE HOSEONG
 
PDF
2019 cvpr paper_overview
LEE HOSEONG
 
PDF
"Google Vizier: A Service for Black-Box Optimization" Paper Review
LEE HOSEONG
 
PDF
"Searching for Activation Functions" Paper Review
LEE HOSEONG
 
PDF
"Learning transferable architectures for scalable image recognition" Paper Re...
LEE HOSEONG
 
PDF
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
LEE HOSEONG
 
PDF
"Dataset and metrics for predicting local visible differences" Paper Review
LEE HOSEONG
 
PDF
"From image level to pixel-level labeling with convolutional networks" Paper ...
LEE HOSEONG
 
PDF
"simple does it weakly supervised instance and semantic segmentation" Paper r...
LEE HOSEONG
 
Unsupervised anomaly detection using style distillation
LEE HOSEONG
 
CNN Architecture A to Z
LEE HOSEONG
 
carrier of_tricks_for_image_classification
LEE HOSEONG
 
Mixed Precision Training Review
LEE HOSEONG
 
MVTec AD: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
LEE HOSEONG
 
YOLOv4: optimal speed and accuracy of object detection review
LEE HOSEONG
 
FixMatch:simplifying semi supervised learning with consistency and confidence
LEE HOSEONG
 
"Revisiting self supervised visual representation learning" Paper Review
LEE HOSEONG
 
Unsupervised visual representation learning overview: Toward Self-Supervision
LEE HOSEONG
 
Human uncertainty makes classification more robust, ICCV 2019 Review
LEE HOSEONG
 
Single Image Super Resolution Overview
LEE HOSEONG
 
2019 ICLR Best Paper Review
LEE HOSEONG
 
2019 cvpr paper_overview
LEE HOSEONG
 
"Google Vizier: A Service for Black-Box Optimization" Paper Review
LEE HOSEONG
 
"Searching for Activation Functions" Paper Review
LEE HOSEONG
 
"Learning transferable architectures for scalable image recognition" Paper Re...
LEE HOSEONG
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
LEE HOSEONG
 
"Dataset and metrics for predicting local visible differences" Paper Review
LEE HOSEONG
 
"From image level to pixel-level labeling with convolutional networks" Paper ...
LEE HOSEONG
 
"simple does it weakly supervised instance and semantic segmentation" Paper r...
LEE HOSEONG
 

Recently uploaded (20)

PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Doc9.....................................
SofiaCollazos
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
The Future of Artificial Intelligence (AI)
Mukul
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 

"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization" Paper Review

  • 1. 2020/09/08 Ho Seong Lee (hoya012) Cognex Deep Learning Lab Research Engineer SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 1
  • 2. Contents • Introduction • Related Work • New Benchmarks • DeepAugment • Experiments • Conclusion SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 2
  • 3. Introduction Human vision system is robust, but existing vision models are not robust. • Humans can deal with many forms of corruption such as blur, pixel noise and abstract change in structure and style. • Achieving robustness is essential in safety-critical and accuracy-critical applications. SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 3 Dog! Dog!Dog! Dog! Dog! starfish! baseball! drumstick!
  • 4. Introduction Most work on robustness methods for vision has focused on adversarial examples. • For standardizing and expanding the robustness topic, several studies have begun. • At first, establish benchmark dataset and evaluation metric. SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 4
  • 5. Related Works Pioneers of robustness in ML - Dan Hendrycks (My Favorite Researcher..) SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 5 Reference: https://people.eecs.berkeley.edu/~hendrycks/
  • 6. Related Works Pioneers of robustness in ML – Madry Lab SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 6 Reference: http://madry-lab.ml/
  • 7. Related Works Pioneers of robustness in ML – Bethge Lab SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 7 Reference: http://bethgelab.org/
  • 8. Related Works SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 8 Benchmarking Neural Network Robustness to Common Corruptions and Perturbations • Create ImageNet-C and ImageNet-P test set and Benchmarks. ImageNet-C ImageNet-P
  • 9. Related Works SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 9 Benchmarking Neural Network Robustness to Common Corruptions and Perturbations • Robustness Metrics = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 ( ) * each level of severity s (1 ≤ s ≤ 5). ImageNet-C ImageNet-P
  • 10. Related Works SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 10 Natural Adversarial Examples • Introduce natural adversarial examples and create 7,500 ImageNet-A test set. (200 class) • Download numerous images related to an ImageNet class from website iNaturalist and Flickr. • Delete the images that ResNet-50 correctly classify. Finally, select a subset of high-quality images. ImageNet-A
  • 11. Related Works SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 11 AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty • Propose a technique to improve the robustness and uncertainty estimates of image classifiers. • Use JS Divergence Consistency loss
  • 12. New Benchmarks SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 12 Seven robustness hypotheses • Larger Models: increasing model size improves robustness. • Self-Attention: adding self-attention layers to models improves robustness. • Diverse Data Augmentation: robustness can increase through data augmentation. • Pretraining: pretraining on larger and more diverse datasets improves robustness. • Texture Bias: convolutional networks are biased towards texture, which harms robustness. • Only IID Accuracy Matters: accuracy on independent and identically distributed test data entirely determines natural robustness. • Synthetic ≠Natural: synthetic robustness interventions including diverse data augmentations do not help with robustness on naturally occurring distribution shifts.
  • 13. New Benchmarks SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 13 Introduce three new robustness benchmarks • It has been difficult to arbitrate these hypotheses because existing robustness datasets preclude the possibility of controlled experiments by varying multiple aspects simultaneously. • To address these issues and test the seven hypotheses outlined above, create new test sets.
  • 14. New Benchmarks SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 14 ImageNet-Renditions (ImageNet-R) • 30,000 test set containing various renditions (e.g., paintings, embroidery, etc.) of ImageNet object classes (200). The rendition styles (“Painting”, “Toy”) are not ImageNet-R’s classes. • Original ImageNet dataset discouraged such images since annotators were instructed to collect “photos only, no painting, no drawings, etc.” (Deng, 2012). Authors do the opposite.
  • 15. New Benchmarks SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 15 StreetView StoreFronts (SVSF) • Contains business storefront images taken from Google Streetview (20 classes). • Investigate natural shifts in the image capture process using metadata (e.g. location, year, camera type) • Create one training set(200K) and five in-distribution test set(10K) from images taken in US/Mexico/Canada during 2019 using “new” camera. Unfortunately, unleased.. • Make four out-of-distribution test set(10K): “2017”, “2018”, “France”, “old camera”
  • 16. New Benchmarks SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 16 DeepFashion Remixed • Changes in camera operation can cause shifts in attributes such as object size, object occlusion, camera viewpoint, and camera zoom. • To measure this, create multi-labeled training set(48K) and 8 out-of-distribution test set (Total 121K). Medium scale, Medium occlusion, side/back viewpoint, no zoom-in Small and large scale, Minimal and heavy occlusion, Frontal and not-worn viewpoints, Medium and large zoom-in
  • 17. DeepAugment SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 17 New data augmentation technique: DeepAugment • In order to explore the Diverse Data Augmentation hypothesis, introduce a new data augmentation. • Pass an image through an img-to-img networks (such as autoencoder or SR network) • But rather than processing the image normally, distort the internal weights and activations by applying randomly sampled ops(zeroing, negating, convolving, transposing), applying activation functions on. • This creates diverse but semantically consistent images.
  • 18. DeepAugment SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 18
  • 19. Experiments SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 19 Experimental Setup
  • 20. Experiments SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 20 Experimental Results: ImageNet-R • ImageNet-200: The original ImageNet test set restricted to ImageNet-R’s 200 classes. • Pretraining → improves IID/OOD gap with very small portion. • Self-Attention → increase IID/OOD gap • Diverse Data Augmentation, Larger Models → improves IID/OOD gap significantly! OODIID Diff -0.2 0.0 0.2 -12.5 -2.0 -6.4 -7.1 -5.7 -10.8 -4.1 Error Rate (↓)
  • 21. Experiments SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 21 Experimental Results: StreetView StoreFronts • No method helps much on country shift, where error rates roughly double across the board. • Images captured in France contain noticeably different architecture styles and storefront designs. • Unable to find conspicuous and consistent indicators of the camera and year. → insensitive feature. • Data augmentation primarily helps combat texture bias as with ImageNet-R. • But existing augmentations are not diverse enough to capture high-level semantic shifts such as building architecture. Error Rate (↓)
  • 22. Experiments SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 22 Experimental Results: DeepFashion Remixed • All evaluated methods have an average OOD mAP that is close to the baseline. • DFR’s size and occlusion shifts hurt performance the most. • Nothing substantially improved OOD performance beyond what is explained by IID performance, so here it would appear that Only IID Accuracy Matters. mAP scores (↑)
  • 23. Experiments SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 23 Experimental Results: ImageNet-C • DeepAugment + AugMix → attain the SOTA result. • Evidence for Larger Models, Self-Attention, Diverse Data Augmentation, Pretraining, and Texture Bias. • Evidence against the Only IID Accuracy Matters. Error Rate, mCE (↓)
  • 24. Experiments SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization 24 Experimental Results: Real Blurry Images • ImageNet-C use various synthetic corruptions that have different from real-world. • Collect a small dataset of 1,000 real-world blurry images and evaluate various models. • Everything that helped in ImageNet-C was also helpful in Real Blurry Images. • Evidence against the Synthetic ≠Natural. Error Rate (↓)
  • 25. Conclusion SNUAI 8th | The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization • Introduce three new benchmarks, ImageNet-R, SVSF, and DFR. (+ Real Blurry Images) • Introduce new data augmentation technique DeepAugment. • With these benchmarks, evaluate seven robustness hypotheses. • It seems that robustness has a many faces (multivariate). • If so, research community should prioritize creating new robustness methods. 25