SlideShare a Scribd company logo
Designing Network Design Spaces
Ilija Radosavovic, et al., “Designing Network Design Spaces”
3rd May, 2020
PR12 Paper Review
JinWon Lee
Samsung Electronics
Designing Network Design Spaces
Introduction
• Over the past several years better architectures have resulted in
considerable progress in a wide range of visual recognition tasks.
 Ex)VGG, ResNet, MobileNet, EfficientNet, etc.
• While manual network design has led to large advances, finding well-
optimized networks manually can be challenging, especially as the
number of design choices increases.
• A popular approach to address this limitation is neural architecture
search (NAS).
• However, it does not enable discovery of network design principles
that deepen our understanding and allow us to generalize to new
settings.
Introduction
• In this work, the authors present a new network design paradigm
that combines the advantages of manual design and NAS.
• Instead of focusing on designing individual network instances, they
design design spaces that parametrize populations of networks.
Exploring RandomlyWired Neural Networks for
Image Recognition(PR-155)
• Design a Network Generator not an
Individual Network!
Introduction
• The authors start with a relatively unconstrained design space we call
AnyNet and apply human-in- the-loop methodology to arrive at a
low-dimensional design space consisting of simple “regular”
networks, RegNet.
• RegNet design space generalizes to various compute regimes,
schedule lengths and network block types.
• They analyze the RegNet design space and arrive at interesting
findings that do not match the current practice of network design.
Tools for Design Space Design
• Rather than designing or searching for a single best model under
specific settings, the authors study the behavior of populations of
models.
• They rely on the concept of network design spaces introduced by
Radosavovic et al., “On network design spaces for visual
recognition.”, ICCV2019.
• Core idea of the paper is that we can quantify the quality of a design
space by sampling a set of models from that design space and
characterizing the resulting model error distribution.
Tools for Design Space Design
• To obtain a distribution of models, sample and train n models from a
design space.
• A primary tool for analyzing design space quality is the error
empirical distribution function (EDF).The error EDF of n models with
errors 𝑒𝑖 is given by:
𝐹 𝑒 =
1
𝑛
෍
𝑖=1
𝑛
1[𝑒𝑖 < 𝑒]
• F(e) gives the fraction of models with
error less than 𝑒.
Tools for Design Space Design
• Given a population of trained models, we can plot and analyze
various network properties versus network error.
• For these plots, an empirical bootstrap is applied to estimate the
likely range in which the best models fall.
The blue shaded regions are ranges containing the best models with 95% confidence, and the black vertical line
the most likely best value.
Tools for Design Space Design
• To summarize:
1. generate distributions of models obtained by sampling and
training n models from a design space.
2. compute and plot error EDFs to summarize design space quality.
3. visualize various properties of a design space and use an
empirical bootstrap to gain insight.
4. use these insights to refine the design space.
The AnyNet Design Space
• Given an input image, a network consists of a simple stem, followed by the
network body that performs the bulk of the computation, and a final network
head that predicts the output classes.
• Keep the stem and head fixed and as simple as possible, and instead focus on
the structure of the network body.
• The network body consists of 4 stages operating at progressively reduced
resolution, each stage consists of a sequence of identical blocks.
AnyNetX
• Most of our experiments use the standard residual bottlenecks block
with group convolution.They refer to this as the X block, and the
AnyNet design space built on it as AnyNetX.
AnyNetX
• The AnyNetX design space has 16 degrees of freedom as each
network consists of 4 stages and each stage 𝑖 has 4 parameters: the
number of blocks 𝑑𝑖, block width 𝑤𝑖, bottleneck ratio 𝑏𝑖, and group
width 𝑔𝑖.
• Resolution 𝑟 = 224 (fixed)
• To obtain valid models, we perform log-uniform sampling of 𝑑𝑖 ≤ 16,
𝑤𝑖 ≤ 1024 and divisible by 8, 𝑏𝑖 ∈ {1, 2, 4}, and 𝑔𝑖 ∈ {1, 2, … , 32}.
• There are (16 ∙ 128 ∙ 3 ∙ 6)4≈ 1018possible model configurations in
the AnyNetX design space.
Design Space Design Aims
1. To simplify the structure of the design.
2. To improve the interpretability of the design space.
3. To improve or maintain the design space quality.
4. To maintain model diversity in the design space.
AnyNetX(A, B, C)
• Refer to unconstrained AnyNet design space as AnyNetXA.
• Shared bottleneck ratio 𝑏𝑖 = 𝑏 for all stage i for the AnyNetXA  AynNetXB.
• Shared group width 𝑔𝑖 = 𝑔 for all stage i for the AnyNetXB  AnyNetXC.
AnyNetX(D, E)
• AnyNetXD is from examining typical network structures of both good
and bad networks from AnyNetXC.
 A pattern emerges: good network have increasing widths.
• AnyNetXD constraint: AnyNetXC & 𝑤𝑖+1 ≥ 𝑤𝑖.
• In addition to stage widths 𝑤𝑖 increasing with i, the stage depths 𝑑𝑖
likewise tend to increase for the best models
• AnyNetXE constraint: AnyNetXD & 𝑑𝑖+1 ≥ 𝑑𝑖.
• Finally, constraints on 𝑤𝑖 and 𝑑𝑖 each reduce the design space by 4!,
with a cumulative reduction of O(107) from AnyNetXA.
AnyNetX(D, E)
Linear Fits
• To gain further insight into the model structure, the best 20 models
from AnyNetXE are showed in a single plot.
• While there is significant variance in the individual models (gray
curves), in the aggregate a pattern emerges.
• In particular, in the same plot we show the line 𝑤𝑗 = 48 · (𝑗 + 1) for
0 ≤ 𝑗 ≤ 20
Linear Fits
• Inspired of AnyNetXD and AnyNetXE, a linear parameterization of
block widths is as follow:
𝑢𝑗 = 𝑤0 + 𝑤 𝑎 ⋅ 𝑗 for 0 ≤ 𝑗 < 𝑑, 𝑤0 > 0, 𝑤 𝑎 > 0
• To quantize 𝑢𝑗, 𝑤 𝑚 is introduced as an additional parameter
𝑢𝑗 = 𝑤0 ⋅ 𝑤 𝑚
𝑠 𝑗
• Then, to quantize 𝑢𝑗, simply rounding 𝑠𝑗 and compute quantized per-
block width 𝑤𝑗 via:
𝑤𝑗 = 𝑤0 ⋅ 𝑤 𝑚
‫ہ‬ 𝑠 ‫ۀ‬𝑗
• Converting the per-block 𝑤𝑗 to per-stage format 𝑤𝑖:
𝑤𝑖 = 𝑤0 ⋅ 𝑤 𝑚
𝑖
𝑑𝑖 = ෍
𝑗
1 ‫ہ‬ 𝑠 ‫ۀ‬𝑗 = 1
Linear Fits
efit is a mean log-ratio
The RegNet Design Space
• The design space of RegNet contains only simple, regular models.
 𝑑 < 64
 𝑤0, 𝑤 𝑎 < 256
 1.5 ≤ 𝑤 𝑚 ≤ 3
 𝑏 𝑎𝑛𝑑 𝑔 are same as AnyNet
• 𝑤 𝑚 = 2 𝑎𝑛𝑑 𝑤0 = 𝑤 𝑎 make good performance, but to maintain
the diversity of models they are not applied to RegNet design space.
Design Space Summary
Design Space Generalization
Design Space Generalization
Common Design Patterns
• The deeper the model, the better the performance.
• Double the number of channels whenever the spatial activation size
is reduced.
• Skip connection is good.
• Bottleneck is good.
• Depthwise separable convolution is popular for low compute regime.
• Inverted bottleneck is also good.
RegNetTrends
• The depth of best models is stable across regimes, with an optimal
depth of ~20 blocks(60 layers).
• This is in contrast to the common practice of using deeper models for
higher flop regimes.
RegNetTrends
• The best models use a bottleneck ratio 𝑏 of 1.0, which effectively
removes the bottleneck.
• The width multiplier 𝑤 𝑚 of good models is ~2.5, similar but not
identical to the popular recipe of doubling widths across stages.
RegNetTrends
• The remaining parameters(𝑔, 𝑤 𝑎, 𝑤0) increase with complexity
Complexity Analysis
• While not a common measure of network complexity, activations can
heavily affect runtime on memory-bound hardware accelerators.
• Activations increase with the square-root of flops, parameters
increase linearly.
RegNetX Constrained
• Using these findings, RegNetX design space is refined – RegNetX C
 𝑏 = 1, 𝑑 ≤ 40, and 𝑤 𝑚 ≥ 2
 Limited parameters and activations following complexity analysis
 Further depth limit: 12 ≤ 𝑑 ≤ 28
Alternate Design Choices
• Inverted bottleneck(𝑏 < 1) degrades the EDF slightly and depthwise
conv performs even worse relative to 𝑏 = 1 and 𝑔 ≥ 1.
• For RegNetX, a fixed resolution of 224x224 is best, even at higher flops.
• Squeeze-and-Excitation(SE) op yields good gains – RegNetY
Comparison to Existing Networks
Comparison to Existing Networks
Comparison to Existing Networks
Comparison to Existing Networks
Comparison to Existing Networks
• The higher flop models have a large number of blocks in the third
stage and a small number of blocks in the last stage.
• The group width 𝑔 increases with complexity, but depth 𝑑 saturates
for large models.
State of the Art Comparison: Mobile Regime
RegNeXt
Comparison
EfficientNet
Comparison
At low flops, EfficientNet outperforms the
RegNetY. At intermediate flops, RegNetY
outperforms EfficientNet, and at higher
flops both RegNetX and RegNetY perform
better.
Test Set Evaluation
Additional Ablations
• Fixed Depth
 Surprisingly, fixed-depth networks can match the performance of variable depth networks
for all flop regimes.
• Fewer Stages
 Top RegNet models at high flops have few blocks in the fourth stage but, 3 stage networks
perform considerably worse.
• Inverted Bottleneck
 In a high-compute regime, b < 1 degrades results further.
Additional Ablations
• Swish vs ReLU
 Swish outperforms ReLU at low flops, but ReLU is better at high flops.
 Interestingly, if g is restricted to be 1(depthwise conv), Swish performs much
better than ReLU.
Optimization Settings
• Initial learning rate and weight decay are stable across complexity regimes.
RegNet
EfficientNet

More Related Content

What's hot (20)

PDF
Cluster analysis
Venkata Reddy Konasani
 
PDF
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
Ha Phuong
 
PDF
Explicit Density Models
Sangwoo Mo
 
PDF
Attention-based Models (DLAI D8L 2017 UPC Deep Learning for Artificial Intell...
Universitat Politècnica de Catalunya
 
PDF
Graph Neural Networks for Recommendations
WQ Fan
 
PPTX
Lecture 6: Ensemble Methods
Marina Santini
 
PDF
LeNet-5
佳蓉 倪
 
PDF
Graph Kernelpdf
pratik shukla
 
PPTX
[논문리뷰] Data Augmentation for 1D 시계열 데이터
Donghyeon Kim
 
PDF
210523 swin transformer v1.5
taeseon ryu
 
PPTX
Feature pyramid networks for object detection
heedaeKwon
 
PPTX
Variational continual learning
Nguyen Giang
 
PPTX
Active learning: Scenarios and techniques
web2webs
 
PPTX
04 Multi-layer Feedforward Networks
Tamer Ahmed Farrag, PhD
 
PDF
Online Coreset Selection for Rehearsal-based Continual Learning
MLAI2
 
PPTX
YOLO v1
오 혜린
 
PDF
李俊良/Feature Engineering in Machine Learning
台灣資料科學年會
 
PDF
End-to-End Object Detection with Transformers
Seunghyun Hwang
 
PPTX
Self-organizing map
Tarat Diloksawatdikul
 
PDF
Genetic Algorithm (GA) Optimization - Step-by-Step Example
Ahmed Gad
 
Cluster analysis
Venkata Reddy Konasani
 
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
Ha Phuong
 
Explicit Density Models
Sangwoo Mo
 
Attention-based Models (DLAI D8L 2017 UPC Deep Learning for Artificial Intell...
Universitat Politècnica de Catalunya
 
Graph Neural Networks for Recommendations
WQ Fan
 
Lecture 6: Ensemble Methods
Marina Santini
 
LeNet-5
佳蓉 倪
 
Graph Kernelpdf
pratik shukla
 
[논문리뷰] Data Augmentation for 1D 시계열 데이터
Donghyeon Kim
 
210523 swin transformer v1.5
taeseon ryu
 
Feature pyramid networks for object detection
heedaeKwon
 
Variational continual learning
Nguyen Giang
 
Active learning: Scenarios and techniques
web2webs
 
04 Multi-layer Feedforward Networks
Tamer Ahmed Farrag, PhD
 
Online Coreset Selection for Rehearsal-based Continual Learning
MLAI2
 
YOLO v1
오 혜린
 
李俊良/Feature Engineering in Machine Learning
台灣資料科學年會
 
End-to-End Object Detection with Transformers
Seunghyun Hwang
 
Self-organizing map
Tarat Diloksawatdikul
 
Genetic Algorithm (GA) Optimization - Step-by-Step Example
Ahmed Gad
 

Similar to PR243: Designing Network Design Spaces (20)

PDF
Designing Network Design Spaces
Sungchul Kim
 
PDF
201907 AutoML and Neural Architecture Search
DaeJin Kim
 
PDF
llm lecture 3 stanford blah blah blah blah
saud140081
 
PDF
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
PDF
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
Sangwoo Mo
 
PDF
Convolutional Neural Networks : Popular Architectures
ananth
 
PPTX
NS - CUK Seminar: V.T.Hoang, Review on "Long Range Graph Benchmark.", NeurIPS...
ssuser4b1f48
 
PDF
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
Jinwon Lee
 
PPTX
Computer Vision for Beginners
Sanghamitra Deb
 
PDF
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
Jinwon Lee
 
PPTX
AlexNet
Bertil Hatt
 
PDF
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
Jinwon Lee
 
PDF
Finding the best solution for Image Processing
Tech Triveni
 
PPTX
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
UMBC
 
PDF
State-of-the-art Image Processing across all domains
Knoldus Inc.
 
PDF
How to Design Efficient Deep Convolutional Architectures
Coderx7
 
ODP
Convolutional Neural Networks
Tianxiang Xiong
 
PPTX
Survey on HW-aware NAS
Yi-Wen Hung
 
PPTX
19-7960-07-notes.pptx
MuhammadFahmiPamungk
 
PDF
IRJET - Single Image Super Resolution using Machine Learning
IRJET Journal
 
Designing Network Design Spaces
Sungchul Kim
 
201907 AutoML and Neural Architecture Search
DaeJin Kim
 
llm lecture 3 stanford blah blah blah blah
saud140081
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
Sangwoo Mo
 
Convolutional Neural Networks : Popular Architectures
ananth
 
NS - CUK Seminar: V.T.Hoang, Review on "Long Range Graph Benchmark.", NeurIPS...
ssuser4b1f48
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
Jinwon Lee
 
Computer Vision for Beginners
Sanghamitra Deb
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
Jinwon Lee
 
AlexNet
Bertil Hatt
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
Jinwon Lee
 
Finding the best solution for Image Processing
Tech Triveni
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
UMBC
 
State-of-the-art Image Processing across all domains
Knoldus Inc.
 
How to Design Efficient Deep Convolutional Architectures
Coderx7
 
Convolutional Neural Networks
Tianxiang Xiong
 
Survey on HW-aware NAS
Yi-Wen Hung
 
19-7960-07-notes.pptx
MuhammadFahmiPamungk
 
IRJET - Single Image Super Resolution using Machine Learning
IRJET Journal
 
Ad

More from Jinwon Lee (20)

PDF
PR-366: A ConvNet for 2020s
Jinwon Lee
 
PDF
PR-355: Masked Autoencoders Are Scalable Vision Learners
Jinwon Lee
 
PDF
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
Jinwon Lee
 
PDF
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
Jinwon Lee
 
PDF
PR-297: Training data-efficient image transformers & distillation through att...
Jinwon Lee
 
PDF
PR-284: End-to-End Object Detection with Transformers(DETR)
Jinwon Lee
 
PDF
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
Jinwon Lee
 
PDF
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
Jinwon Lee
 
PDF
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Jinwon Lee
 
PDF
PR-217: EfficientDet: Scalable and Efficient Object Detection
Jinwon Lee
 
PDF
PR-207: YOLOv3: An Incremental Improvement
Jinwon Lee
 
PDF
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
Jinwon Lee
 
PDF
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
Jinwon Lee
 
PDF
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Jinwon Lee
 
PDF
PR-132: SSD: Single Shot MultiBox Detector
Jinwon Lee
 
PDF
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
Jinwon Lee
 
PDF
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
Jinwon Lee
 
PDF
In datacenter performance analysis of a tensor processing unit
Jinwon Lee
 
PDF
Efficient Neural Architecture Search via Parameter Sharing
Jinwon Lee
 
PDF
ShuffleNet - PR054
Jinwon Lee
 
PR-366: A ConvNet for 2020s
Jinwon Lee
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
Jinwon Lee
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
Jinwon Lee
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
Jinwon Lee
 
PR-297: Training data-efficient image transformers & distillation through att...
Jinwon Lee
 
PR-284: End-to-End Object Detection with Transformers(DETR)
Jinwon Lee
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
Jinwon Lee
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
Jinwon Lee
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Jinwon Lee
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
Jinwon Lee
 
PR-207: YOLOv3: An Incremental Improvement
Jinwon Lee
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
Jinwon Lee
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
Jinwon Lee
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Jinwon Lee
 
PR-132: SSD: Single Shot MultiBox Detector
Jinwon Lee
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
Jinwon Lee
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
Jinwon Lee
 
In datacenter performance analysis of a tensor processing unit
Jinwon Lee
 
Efficient Neural Architecture Search via Parameter Sharing
Jinwon Lee
 
ShuffleNet - PR054
Jinwon Lee
 
Ad

Recently uploaded (20)

PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
July Patch Tuesday
Ivanti
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
July Patch Tuesday
Ivanti
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 

PR243: Designing Network Design Spaces

  • 1. Designing Network Design Spaces Ilija Radosavovic, et al., “Designing Network Design Spaces” 3rd May, 2020 PR12 Paper Review JinWon Lee Samsung Electronics
  • 3. Introduction • Over the past several years better architectures have resulted in considerable progress in a wide range of visual recognition tasks.  Ex)VGG, ResNet, MobileNet, EfficientNet, etc. • While manual network design has led to large advances, finding well- optimized networks manually can be challenging, especially as the number of design choices increases. • A popular approach to address this limitation is neural architecture search (NAS). • However, it does not enable discovery of network design principles that deepen our understanding and allow us to generalize to new settings.
  • 4. Introduction • In this work, the authors present a new network design paradigm that combines the advantages of manual design and NAS. • Instead of focusing on designing individual network instances, they design design spaces that parametrize populations of networks.
  • 5. Exploring RandomlyWired Neural Networks for Image Recognition(PR-155) • Design a Network Generator not an Individual Network!
  • 6. Introduction • The authors start with a relatively unconstrained design space we call AnyNet and apply human-in- the-loop methodology to arrive at a low-dimensional design space consisting of simple “regular” networks, RegNet. • RegNet design space generalizes to various compute regimes, schedule lengths and network block types. • They analyze the RegNet design space and arrive at interesting findings that do not match the current practice of network design.
  • 7. Tools for Design Space Design • Rather than designing or searching for a single best model under specific settings, the authors study the behavior of populations of models. • They rely on the concept of network design spaces introduced by Radosavovic et al., “On network design spaces for visual recognition.”, ICCV2019. • Core idea of the paper is that we can quantify the quality of a design space by sampling a set of models from that design space and characterizing the resulting model error distribution.
  • 8. Tools for Design Space Design • To obtain a distribution of models, sample and train n models from a design space. • A primary tool for analyzing design space quality is the error empirical distribution function (EDF).The error EDF of n models with errors 𝑒𝑖 is given by: 𝐹 𝑒 = 1 𝑛 ෍ 𝑖=1 𝑛 1[𝑒𝑖 < 𝑒] • F(e) gives the fraction of models with error less than 𝑒.
  • 9. Tools for Design Space Design • Given a population of trained models, we can plot and analyze various network properties versus network error. • For these plots, an empirical bootstrap is applied to estimate the likely range in which the best models fall. The blue shaded regions are ranges containing the best models with 95% confidence, and the black vertical line the most likely best value.
  • 10. Tools for Design Space Design • To summarize: 1. generate distributions of models obtained by sampling and training n models from a design space. 2. compute and plot error EDFs to summarize design space quality. 3. visualize various properties of a design space and use an empirical bootstrap to gain insight. 4. use these insights to refine the design space.
  • 11. The AnyNet Design Space • Given an input image, a network consists of a simple stem, followed by the network body that performs the bulk of the computation, and a final network head that predicts the output classes. • Keep the stem and head fixed and as simple as possible, and instead focus on the structure of the network body. • The network body consists of 4 stages operating at progressively reduced resolution, each stage consists of a sequence of identical blocks.
  • 12. AnyNetX • Most of our experiments use the standard residual bottlenecks block with group convolution.They refer to this as the X block, and the AnyNet design space built on it as AnyNetX.
  • 13. AnyNetX • The AnyNetX design space has 16 degrees of freedom as each network consists of 4 stages and each stage 𝑖 has 4 parameters: the number of blocks 𝑑𝑖, block width 𝑤𝑖, bottleneck ratio 𝑏𝑖, and group width 𝑔𝑖. • Resolution 𝑟 = 224 (fixed) • To obtain valid models, we perform log-uniform sampling of 𝑑𝑖 ≤ 16, 𝑤𝑖 ≤ 1024 and divisible by 8, 𝑏𝑖 ∈ {1, 2, 4}, and 𝑔𝑖 ∈ {1, 2, … , 32}. • There are (16 ∙ 128 ∙ 3 ∙ 6)4≈ 1018possible model configurations in the AnyNetX design space.
  • 14. Design Space Design Aims 1. To simplify the structure of the design. 2. To improve the interpretability of the design space. 3. To improve or maintain the design space quality. 4. To maintain model diversity in the design space.
  • 15. AnyNetX(A, B, C) • Refer to unconstrained AnyNet design space as AnyNetXA. • Shared bottleneck ratio 𝑏𝑖 = 𝑏 for all stage i for the AnyNetXA  AynNetXB. • Shared group width 𝑔𝑖 = 𝑔 for all stage i for the AnyNetXB  AnyNetXC.
  • 16. AnyNetX(D, E) • AnyNetXD is from examining typical network structures of both good and bad networks from AnyNetXC.  A pattern emerges: good network have increasing widths. • AnyNetXD constraint: AnyNetXC & 𝑤𝑖+1 ≥ 𝑤𝑖. • In addition to stage widths 𝑤𝑖 increasing with i, the stage depths 𝑑𝑖 likewise tend to increase for the best models • AnyNetXE constraint: AnyNetXD & 𝑑𝑖+1 ≥ 𝑑𝑖. • Finally, constraints on 𝑤𝑖 and 𝑑𝑖 each reduce the design space by 4!, with a cumulative reduction of O(107) from AnyNetXA.
  • 18. Linear Fits • To gain further insight into the model structure, the best 20 models from AnyNetXE are showed in a single plot. • While there is significant variance in the individual models (gray curves), in the aggregate a pattern emerges. • In particular, in the same plot we show the line 𝑤𝑗 = 48 · (𝑗 + 1) for 0 ≤ 𝑗 ≤ 20
  • 19. Linear Fits • Inspired of AnyNetXD and AnyNetXE, a linear parameterization of block widths is as follow: 𝑢𝑗 = 𝑤0 + 𝑤 𝑎 ⋅ 𝑗 for 0 ≤ 𝑗 < 𝑑, 𝑤0 > 0, 𝑤 𝑎 > 0 • To quantize 𝑢𝑗, 𝑤 𝑚 is introduced as an additional parameter 𝑢𝑗 = 𝑤0 ⋅ 𝑤 𝑚 𝑠 𝑗 • Then, to quantize 𝑢𝑗, simply rounding 𝑠𝑗 and compute quantized per- block width 𝑤𝑗 via: 𝑤𝑗 = 𝑤0 ⋅ 𝑤 𝑚 ‫ہ‬ 𝑠 ‫ۀ‬𝑗 • Converting the per-block 𝑤𝑗 to per-stage format 𝑤𝑖: 𝑤𝑖 = 𝑤0 ⋅ 𝑤 𝑚 𝑖 𝑑𝑖 = ෍ 𝑗 1 ‫ہ‬ 𝑠 ‫ۀ‬𝑗 = 1
  • 20. Linear Fits efit is a mean log-ratio
  • 21. The RegNet Design Space • The design space of RegNet contains only simple, regular models.  𝑑 < 64  𝑤0, 𝑤 𝑎 < 256  1.5 ≤ 𝑤 𝑚 ≤ 3  𝑏 𝑎𝑛𝑑 𝑔 are same as AnyNet • 𝑤 𝑚 = 2 𝑎𝑛𝑑 𝑤0 = 𝑤 𝑎 make good performance, but to maintain the diversity of models they are not applied to RegNet design space.
  • 25. Common Design Patterns • The deeper the model, the better the performance. • Double the number of channels whenever the spatial activation size is reduced. • Skip connection is good. • Bottleneck is good. • Depthwise separable convolution is popular for low compute regime. • Inverted bottleneck is also good.
  • 26. RegNetTrends • The depth of best models is stable across regimes, with an optimal depth of ~20 blocks(60 layers). • This is in contrast to the common practice of using deeper models for higher flop regimes.
  • 27. RegNetTrends • The best models use a bottleneck ratio 𝑏 of 1.0, which effectively removes the bottleneck. • The width multiplier 𝑤 𝑚 of good models is ~2.5, similar but not identical to the popular recipe of doubling widths across stages.
  • 28. RegNetTrends • The remaining parameters(𝑔, 𝑤 𝑎, 𝑤0) increase with complexity
  • 29. Complexity Analysis • While not a common measure of network complexity, activations can heavily affect runtime on memory-bound hardware accelerators. • Activations increase with the square-root of flops, parameters increase linearly.
  • 30. RegNetX Constrained • Using these findings, RegNetX design space is refined – RegNetX C  𝑏 = 1, 𝑑 ≤ 40, and 𝑤 𝑚 ≥ 2  Limited parameters and activations following complexity analysis  Further depth limit: 12 ≤ 𝑑 ≤ 28
  • 31. Alternate Design Choices • Inverted bottleneck(𝑏 < 1) degrades the EDF slightly and depthwise conv performs even worse relative to 𝑏 = 1 and 𝑔 ≥ 1. • For RegNetX, a fixed resolution of 224x224 is best, even at higher flops. • Squeeze-and-Excitation(SE) op yields good gains – RegNetY
  • 36. Comparison to Existing Networks • The higher flop models have a large number of blocks in the third stage and a small number of blocks in the last stage. • The group width 𝑔 increases with complexity, but depth 𝑑 saturates for large models.
  • 37. State of the Art Comparison: Mobile Regime
  • 39. EfficientNet Comparison At low flops, EfficientNet outperforms the RegNetY. At intermediate flops, RegNetY outperforms EfficientNet, and at higher flops both RegNetX and RegNetY perform better.
  • 41. Additional Ablations • Fixed Depth  Surprisingly, fixed-depth networks can match the performance of variable depth networks for all flop regimes. • Fewer Stages  Top RegNet models at high flops have few blocks in the fourth stage but, 3 stage networks perform considerably worse. • Inverted Bottleneck  In a high-compute regime, b < 1 degrades results further.
  • 42. Additional Ablations • Swish vs ReLU  Swish outperforms ReLU at low flops, but ReLU is better at high flops.  Interestingly, if g is restricted to be 1(depthwise conv), Swish performs much better than ReLU.
  • 43. Optimization Settings • Initial learning rate and weight decay are stable across complexity regimes. RegNet EfficientNet