Visualizing and understanding convolutional networks(2014)

Visualizing and Understanding
Convolutional Networks
신우철

Introduction
• Renewed interest in CNN: 1) availability of much larger training sets, 2)
powerful GPU implementations, 3) better model regularization strategies.
• Visualization technique that reveals the input stimuli that excite individual
feature maps, and evolution of features during training
1) Multi-layered Deconvnet: project feature activations back to the input
pixel space.
2) Sensitivity analysis of the classifier output: occluding portions of the input
image, revealing which parts of the scene are important.

Approach
1) Visualization with Deconvnet
• Deconvnet uses same components of a convnet model, such as filtering or
pooling, but in reverse.
• This way the feature activities in intermediate layers can be mapped back to
the input pixel space, showing what input pattern caused a given activation
in the feature maps.

Approach
(1) Present input image to the convnet and compute features throughout the
layers.
(2) Set all other activations in the layer to zero and pass the feature maps as
input to the attached deconvnet layer
(3) (i) unpool, (ii) rectify, (iii) filter to reconstruct the activity in the layer
beneath.

Approach
(i) Unpooling
• Record the locations of the maxima within each pooling region in a set of
switch variables, preserving the structure of the stimulus.
(ii) Rectification
• Pass reconstructed signal through a ReLU non-linearity
(iii) Filtering
• Use transposed versions of the same filters

Training Details
• Used AlexNet but replaced sparse connection of 2 GPUs with dense
connections.
• Few factors modified based on visualization results.

Convnet Visualization
(1) Feature Visualization
• Projecting each feature map down to pixel space reveals the different
structures that excite a given feature map. This shows greater invariance
than corresponding image patches, as the visualization solely focuses on the
discriminant structure within each patch.
(i) The strong grouping within each feature map
(ii) Greater invariance at higher layers
(iii) Exaggeration of discriminative parts of the image

(2) Feature Evolution during Training
• Lower layers of the model can be seen to converge within a few epochs,
while upper layers only develop after a considerable number of epochs.
• Sudden jumps in appearance results from the image from which the
strongest activation originates.
* We can confirm our intuition about required time for convergence by level
of layers through visualization. Moreover, we can use visualization for tuning
hyperparameters.

(3) Feature Invariance
• Small transformations have a dramatic effect in the first layer of the model,
but a lesser impact at the top feature layer for translation & scaling.
• The output is not invariant to rotation except for object with rotational
symmetry.

1. Architecture Selection
• Visualization assisted selecting good architectures
(i) Reduced 1st layer filter size from 11 x 11 to 7 x 7
(ii) Made the stride of the convolution, 2 rather than 4

2. Occlusion Sensitivity
• The probability of the correct class drops significantly when the object is
occluded.
• The model is identifying the location of the object in the image rather than
using the surrounding context.

3. Correspondence Analysis
• Deep models implicitly compute correspondence between specific object
parts in images.

Experiments
• Overall depth of the model(convolutional layer) is important for
performance, while increasing both size of convolutional layers and fully-
connected layers might result in overfitting.

Visualizing and understanding convolutional networks(2014)

More Related Content

What's hot (20)

Similar to Visualizing and understanding convolutional networks(2014) (20)

Recently uploaded (20)

Visualizing and understanding convolutional networks(2014)