Convolution as matrix
multiplication
• Edwin Efraín Jiménez Lepe
16 24 32
47 18 26
68 12 9
Input
0 1
-1 0
2 3
4 5
W1
W2
∗
=
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
Im2col (input)
0 5
1 3
-1 4
0 2
x
W1 W2
=
23 353
50 535
-14 354
-14 248
Rearrange
23 -14
50 -14
353 354
535 248
FeedForward
Applying kernel rotation
16 24 32
47 18 26
68 12 9
Input
0 1
-1 0
2 3
4 5
W1
W2
∗
=
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
Im2col (input)
0 5
1 3
-1 4
0 2
x
W1 W2
=
23 353
50 535
-14 354
-14 248
Rearrange
24 -13
51 -13
353 354
535 248
Now with bias
1
1
1
1
1 0
FeedForward
16 24 32
47 18 26
68 12 9
Input
0 0
-2.94504954e-05 0
d_y
∗ =
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
Im2col (input)
0 0
-2e-05 6e-06
0 0
0 0
x
Im2col(d_y)
-1.38417328e-03 3.00583533
e-04
-2.00263369e-03 4.34886814
e-04
-5.30108917e-04 1.15117098
e-04
-3.53405945e-04 7.67447318
e-05
Rearrange
-1.38417328e-03 -5.30108917e-04
-2.00263369e-03 3.53405945e-04
d_w = input * d_y
The update correspond to the
Rotated kernel
BackPropagation
d_w
0 0
6.39539432e-06 0
-1.38417328e-03 -5.30108917e-04
-2.00263369e-03 3.53405945e-04
0 0
-2.94504954e-05 0
d_y
d_x = d_y * w (without rotation)
BackPropagation
0 0
-6.39539432e-06 0
We need full convolution
And keep kernel unrotated
0 0 0 0
0 0 0 0
0 -2.94504954e-05 0 0
0 0 0 0
d_y
0 0 0 0
0 0 0 0
0 6.39539432e-06 0 0
0 0 0 0
0 1
-1 0
2 3
4 5
W1
W2
∗
0 1
-1 0
2 3
4 5
W1
W2
∗
d_x = d_y * w (without rotation)
BackPropagation
0 0 0 0
0 0 0 0
0 -2.94504954e-05 0 0
0 0 0 0
d_y
0 0 0 0
0 0 0 0
0 6.39539432e-06 0 0
0 0 0 0
=
0 1
-1 0
2 3
4 5
W1
W2
∗
0 0 0 0 0 -2.94e-05 0 0 0
0 0 0 0 -2.94e-05 0 0 0 0
0 0 -2.94e-05 0 0 0 0 0 0
0 -2.94e-05 0 0 0 0 0 0 0
0
-1
1
0
x
0 0 0 0 0 6.395e-06 0 0 0
0 0 0 0 6.395e-06 0 0 0 0
0 0 6.395e-06 0 0 0 0 0 0
0 6.395e-06 0 0 0 0 0 0 0
2
4
3
5
x
T
T
d_x = d_y * w (without rotation)
BackPropagation
0 0 0 0 0 -2.94e-05 0 0 0
0 0 0 0 -2.94e-05 0 0 0 0
0 0 -2.94e-05 0 0 0 0 0 0
0 -2.94e-05 0 0 0 0 0 0 0
0
-1
1
0
x
0 0 0 0 0 6.395e-06 0 0 0
0 0 0 0 6.395e-06 0 0 0 0
0 0 6.395e-06 0 0 0 0 0 0
0 6.395e-06 0 0 0 0 0 0 0
2
4
3
5
x
T
T
=
0
0
-0.2945e-04
0
0.2945e-04
0
0
0
0
0
0.3198e-04
0.1919-04
0
0.2558-04
0.1279-04
0
0
0
d_x = d_y * w (without rotation)
BackPropagation
0
0
-0.2945e-04
0
0.2945e-04
0
0
0
0
0
0.3198e-04
0.1919-04
0
0.2558-04
0.1279-04
0
0
0
+ =
0
0.3198e-04
-0.1026e-04
0
0.5503e-04
0.1279-04
0
0
0
reshape
0 0 0
0.3198e-04 0.5503e-04 0
-0.1026e-04 0.1279-04 0
d_x = d_y * w (without rotation)
BackPropagation
0 0 0 0 0 -2.94e-05 0 0 0
0 0 0 0 -2.94e-05 0 0 0 0
0 0 -2.94e-05 0 0 0 0 0 0
0 -2.94e-05 0 0 0 0 0 0 0
0
-1
1
0
x
0 0 0 0 0 6.395e-06 0 0 0
0 0 0 0 6.395e-06 0 0 0 0
0 0 6.395e-06 0 0 0 0 0 0
0 6.395e-06 0 0 0 0 0 0 0
2
4
3
5
T
=
In fact, we can do it in just one operation
0
0.3198e-04
-0.1026e-04
0
0.5503e-04
0.1279-04
0
0
0
Notice, every channel of delta is multiplied
by the correspondent filter that generates it
A multi-channel example
16 24 32
47 18 26
68 12 9
Input
26 57 43
24 21 12
02 11 19
18 47 21
4 6 12
81 22 13
0 1
-1 0
2 3
4 5
Filter 1 Filter 2
∗
-2 68
24 16
18 32
22 60
23 7
46 35
42 20
81 78
(3,3,3) (2,3,2,2)
Output= (2,2,2)
=
2171 2170
5954 2064
13042 13575
11023 6425
Applying theano convolution (which rotates
Automatically the filters)
A multi-channel example (vectorized)
16 24 32
47 18 26
68 12 9
Input
26 57 43
24 21 12
02 11 19
18 47 21
4 6 12
81 22 13
0 1
-1 0
2 3
4 5
Filter 1 Filter 2
∗
-2 68
24 16
18 32
22 60
23 7
46 35
42 20
81 78
(3,3,3) (2,3,2,2)
=
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
26 24 57 21
24 2 21 11
57 21 43 12
21 11 12 19
18 4 47 6
4 81 6 22
47 6 21 12
6 22 12 13
0 60
1 32
-1 22
0 18
5 35
3 7
4 46
2 23
16 78
68 20
24 81
-2 42
x
T
A multi-channel example (vectorized)
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
26 24 57 21
24 2 21 11
57 21 43 12
21 11 12 19
18 4 47 6
4 81 6 22
47 6 21 12
6 22 12 13
x =
T
0 60
1 32
-1 22
0 18
5 35
3 7
4 46
2 23
16 78
68 20
24 81
-2 42
2171 13042
5954 11023
2170 13575
2064 6425
Channel 1
Channel 2
Rearrange
2171 2170
5954 2064
13042 13575
11023 6425
Backpropagation
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
Imagine we got the next error
from an up-layer
And we want to propagate it to
the correspondent layer (input of convolution)
We need to compute d_y * w (without rotation)
But is a ‘full’ convolution, so we add 1 zero padding to d_y
0 0 0 0
0 .1678 .098 0
0 .002 .246 0
0 0 0 0
0 0 0 0
0 0.5 .67 0
0 .21 .487 0
0 0 0 0
Backpropagation d_y-1=d_y * w (without rotation)
0 0 0 0
0 .1678 .098 0
0 .002 .246 0
0 0 0 0
0 0 0 0
0 0.5 .67 0
0 .21 .487 0
0 0 0 0
im2col
0 0 0 0 .1678 .002 0 .098 .246
0 0 0 .1678 .002 0 .098 .246 0
0 .1678 .002 0 .098 .246 0 0 0
.1678 .002 0 .098 .246 0 0 0 0
0 0 0 0 .5 .21 0 .67 .487
0 0 0 .5 .21 0 .67 .487 0
0 .5 .21 0 .67 .487 0 0 0
.5 .21 0 .67 .487 0 0 0 0
Backpropagation d_y-1=d_y * w (without rotation)
0 0 0 0 .1678 .002 0 .098 .246
0 0 0 .1678 .002 0 .098 .246 0
0 .1678 .002 0 .098 .246 0 0 0
.1678 .002 0 .098 .246 0 0 0 0
0 0 0 0 .5 .21 0 .67 .487
0 0 0 .5 .21 0 .67 .487 0
0 .5 .21 0 .67 .487 0 0 0
.5 .21 0 .67 .487 0 0 0 0
T
Notice, every channel of delta is multiplied
by the correspondent filter that generates it
0
-1
1
0
2
4
3
5
-2
24
68
16
18
22
32
60
23
46
7
35
42
81
20
78
x =
30 18.339 41.6848
28.7678 11.3634 37.8224
6.722 1.476 4.336
51.0322 47.6112 98.3552
64.376 44.7626 99.7084
19.61 8.981 35.284
14.642 31.212 56.622
22.528 38.992 73.295
8.766 11.693 19.962
Backpropagation d_y-1=d_y * w (without rotation)
30 18.339 41.6848
28.7678 11.3634 37.8224
6.722 1.476 4.336
51.0322 47.6112 98.3552
64.376 44.7626 99.7084
19.61 8.981 35.284
14.642 31.212 56.622
22.528 38.992 73.295
8.766 11.693 19.962
rearrange
30 51.0322 14.642
28.7678 64.376 22.528
6.722 19.61 8.766
18.339 47.6112 31.212
11.3634 44.7626 38.992
1.476 8.981 11.693
41.6848 98.3552 56.622
37.8224 99.7084 73.295
4.336 35.284 19.962
Backpropagation (no vectorized) d_y-1=d_y * w (without rotation)
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
(2,2,2)
0 1
-1 0
2 3
4 5
Filter 1 Filter 2
∗
-2 68
24 16
18 32
22 60
23 7
46 35
42 20
81 78
(2,3,2,2)
Transpose
dimensions
0 and 1
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
(2,2,2)
0 1
-1 0
2 3
4 5
Filter 1 Filter 2
∗
-2 68
24 16
18 32
22 60
23 7
46 35
42 20
81 78
(3,2,2,2)
Filter 3
Backpropagation (no vectorized, full convolution)
d_y-1=d_y * w (without rotation)
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
(2,2,2)
0 1
-1 0
2 3
4 5
Filter 1 Filter 2
∗
-2 68
24 16
18 32
22 60
23 7
46 35
42 20
81 78
(3,2,2,2)
Filter 3
=
30 51.0322 14.642
28.7678 64.376 22.528
6.722 19.61 8.766
18.339 47.6112 31.212
11.3634 44.7626 38.992
1.476 8.981 11.693
41.6848 98.3552 56.622
37.8224 99.7084 73.295
4.336 35.284 19.962
Backpropagation
d_w=input * d_y16 24 32
47 18 26
68 12 9
Input
26 57 43
24 21 12
02 11 19
18 47 21
4 6 12
81 22 13
(3,3,3)
∗
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
=
d_y
(2,2,2)
Dimensions do not match,
So it is telling us that we need to
Apply both filters to any cannel of the input
16 24 32
47 18 26
68 12 9
∗
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
=
9.5588 13.5952
12.7386 7.8064
42.716 49.882
55.684 33.323
26 57 43
24 21 12
02 11 19
∗
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
=
15.1628 16.7726
8.7952 9.3958
66.457 67.564
31.847 30.103
∗
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
=
9.1104 12.9086
6.8332 5.4248
44.252 44.674
33.744 21.991
18 47 21
4 6 12
81 22 13
Backpropagation d_w=input * d_y
=
33.832 43.2764
28.367 22.627
153.425 162.12
121.275 85.417
Error associated with rotated kernel, it means
We need to rotate this result to update the
unrotated kernel
9.5588 13.5952
12.7386 7.8064
42.716 49.882
55.684 33.323
15.1628 16.7726
8.7952 9.3958
66.457 67.564
31.847 30.103
9.1104 12.9086
6.8332 5.4248
44.252 44.674
33.744 21.991
+
+
Backpropagation vectorized d_w=input * d_y (without rotate d_y)
16 24 32
47 18 26
68 12 9
Input
26 57 43
24 21 12
02 11 19
18 47 21
4 6 12
81 22 13
(3,3,3)
∗
.1678 .098
.002 .246
0.5 0.67
0.21 0.487
=
d_y
(2,2,2)
Dimensions do not match,
So it is telling us that we need to
Apply both filters to any cannel of the input
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
26 24 57 21
24 2 21 11
57 21 43 12
21 11 12 19
18 4 47 6
4 81 6 22
47 6 21 12
6 22 12 13
T
x
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
Backpropagation vectorized d_w=input * d_y (without rotate d_y)
16 47 24 18
47 68 18 12
24 18 32 26
18 12 26 9
26 24 57 21
24 2 21 11
57 21 43 12
21 11 12 19
18 4 47 6
4 81 6 22
47 6 21 12
6 22 12 13
T
x
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
.1678 0.5
.002 0.21
.098 0.67
.246 0.487
=
33.832 153.425
28.367 121.275
43.2764 162.12
22.627 85.417
33.832 43.2764
28.367 22.627
153.425 162.12
121.275 85.417
rearrange

More Related Content

PDF
Control systems formula book
PDF
ARM Architecture Instruction Set
PDF
Microcontroller pic 16 f877 registers memory ports
PDF
GPIO In Arm cortex-m4 tiva-c
PPTX
Arm instruction set
PPTX
Synchronous down counter
PPTX
Stack in 8085 microprocessor
PPTX
Arm cortex-m4 programmer model
Control systems formula book
ARM Architecture Instruction Set
Microcontroller pic 16 f877 registers memory ports
GPIO In Arm cortex-m4 tiva-c
Arm instruction set
Synchronous down counter
Stack in 8085 microprocessor
Arm cortex-m4 programmer model

What's hot (20)

PPT
Arithmetic & logical operations in 8051
PPTX
Artifical intelligence in agriculture
PPTX
QRT-PPT-2019.pptx
PPTX
Smart farming ppt.
PDF
Memory segmentation-of-8086
DOCX
Steps for design of butterworth and chebyshev filter
PPTX
Presentation On RAWE
PPTX
Radix-2 DIT FFT
PPTX
Intel 8051 Programming in C
PPTX
Latches and flip flop
DOCX
READY / RAWE report
PPTX
Memory banking-of-8086-final
PPTX
8254 Programmable Interval Timer by vijay
PDF
ARM CORTEX M3 PPT
PPT
Data representation and Arithmetic Algorithms
PPTX
Full custom digital ic design of priority encoder
DOCX
Modified booth
PPTX
ARM Processors
PDF
Advanced microprocessor
Arithmetic & logical operations in 8051
Artifical intelligence in agriculture
QRT-PPT-2019.pptx
Smart farming ppt.
Memory segmentation-of-8086
Steps for design of butterworth and chebyshev filter
Presentation On RAWE
Radix-2 DIT FFT
Intel 8051 Programming in C
Latches and flip flop
READY / RAWE report
Memory banking-of-8086-final
8254 Programmable Interval Timer by vijay
ARM CORTEX M3 PPT
Data representation and Arithmetic Algorithms
Full custom digital ic design of priority encoder
Modified booth
ARM Processors
Advanced microprocessor
Ad

Viewers also liked (11)

PDF
Introduction to Convolutional Neural Networks
PPTX
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
PDF
Backpropagation in Convolutional Neural Network
PDF
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
PPTX
Neuroevolution and deep learing
PDF
Deep Convolutional Neural Networks - Overview
PDF
Convolution codes - Coding/Decoding Tree codes and Trellis codes for multiple...
PPTX
Introduction to CNN
PDF
101: Convolutional Neural Networks
PDF
Convolutional Neural Networks (CNN)
PDF
Deep Learning - Convolutional Neural Networks
Introduction to Convolutional Neural Networks
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Backpropagation in Convolutional Neural Network
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
Neuroevolution and deep learing
Deep Convolutional Neural Networks - Overview
Convolution codes - Coding/Decoding Tree codes and Trellis codes for multiple...
Introduction to CNN
101: Convolutional Neural Networks
Convolutional Neural Networks (CNN)
Deep Learning - Convolutional Neural Networks
Ad

Similar to Convolution as matrix multiplication (20)

PDF
Capítulo 02 considerações estatísticas
DOCX
Calculation template
PDF
Solucionario_Diseno_en_Ingenieria_Mecani (1).pdf
PDF
Solucionario_Diseno_en_Ingenieria_Mecani.pdf
DOCX
Ejerciciooo3
DOCX
project designa.docx
PDF
Solution manual for Precalculus, 4th Edition Cynthia Y. Young
PDF
Budynas sm ch20
PDF
DOCX
Examen final
PDF
Solution manual For College Algebra and Trigonometry 5e Young
DOC
Examples on total consolidation
PDF
Gaussian Elimination
PDF
Shi20396 ch03
PDF
130 problemas dispositivos electronicos lopez meza brayan
PPT
Math 6 (Please download first to activate the different animation settings)
PDF
Budynas sm ch01
PDF
Perfect method for Frames
PDF
Coeficiente de correlacion lineal 5 1
Capítulo 02 considerações estatísticas
Calculation template
Solucionario_Diseno_en_Ingenieria_Mecani (1).pdf
Solucionario_Diseno_en_Ingenieria_Mecani.pdf
Ejerciciooo3
project designa.docx
Solution manual for Precalculus, 4th Edition Cynthia Y. Young
Budynas sm ch20
Examen final
Solution manual For College Algebra and Trigonometry 5e Young
Examples on total consolidation
Gaussian Elimination
Shi20396 ch03
130 problemas dispositivos electronicos lopez meza brayan
Math 6 (Please download first to activate the different animation settings)
Budynas sm ch01
Perfect method for Frames
Coeficiente de correlacion lineal 5 1

Recently uploaded (20)

PDF
Odoo Construction Management System by CandidRoot
PDF
Top 10 Project Management Software for Small Teams in 2025.pdf
PPTX
StacksandQueuesCLASS 12 COMPUTER SCIENCE.pptx
PPTX
AI Tools Revolutionizing Software Development Workflows
PDF
Cloud Native Aachen Meetup - Aug 21, 2025
PPTX
Human-Computer Interaction for Lecture 2
PPTX
Human-Computer Interaction for Lecture 1
PDF
Mobile App for Guard Tour and Reporting.pdf
PPTX
Beige and Black Minimalist Project Deck Presentation (1).pptx
PPTX
Post-Migration Optimization Playbook: Getting the Most Out of Your New Adobe ...
PPTX
WJQSJXNAZJVCVSAXJHBZKSJXKJKXJSBHJBJEHHJB
PDF
Building an Inclusive Web Accessibility Made Simple with Accessibility Analyzer
PDF
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
PDF
WhatsApp Chatbots The Key to Scalable Customer Support.pdf
PPTX
MCP empowers AI Agents from Zero to Production
PDF
Crypto Loss And Recovery Guide By Expert Recovery Agency.
PPT
3.Software Design for software engineering
PDF
Sanket Mhaiskar Resume - Senior Software Engineer (Backend, AI)
PPTX
HackYourBrain__UtrechtJUG__11092025.pptx
PDF
Ragic Data Security Overview: Certifications, Compliance, and Network Safegua...
Odoo Construction Management System by CandidRoot
Top 10 Project Management Software for Small Teams in 2025.pdf
StacksandQueuesCLASS 12 COMPUTER SCIENCE.pptx
AI Tools Revolutionizing Software Development Workflows
Cloud Native Aachen Meetup - Aug 21, 2025
Human-Computer Interaction for Lecture 2
Human-Computer Interaction for Lecture 1
Mobile App for Guard Tour and Reporting.pdf
Beige and Black Minimalist Project Deck Presentation (1).pptx
Post-Migration Optimization Playbook: Getting the Most Out of Your New Adobe ...
WJQSJXNAZJVCVSAXJHBZKSJXKJKXJSBHJBJEHHJB
Building an Inclusive Web Accessibility Made Simple with Accessibility Analyzer
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
WhatsApp Chatbots The Key to Scalable Customer Support.pdf
MCP empowers AI Agents from Zero to Production
Crypto Loss And Recovery Guide By Expert Recovery Agency.
3.Software Design for software engineering
Sanket Mhaiskar Resume - Senior Software Engineer (Backend, AI)
HackYourBrain__UtrechtJUG__11092025.pptx
Ragic Data Security Overview: Certifications, Compliance, and Network Safegua...

Convolution as matrix multiplication

  • 1. Convolution as matrix multiplication • Edwin Efraín Jiménez Lepe
  • 2. 16 24 32 47 18 26 68 12 9 Input 0 1 -1 0 2 3 4 5 W1 W2 ∗ = 16 47 24 18 47 68 18 12 24 18 32 26 18 12 26 9 Im2col (input) 0 5 1 3 -1 4 0 2 x W1 W2 = 23 353 50 535 -14 354 -14 248 Rearrange 23 -14 50 -14 353 354 535 248 FeedForward Applying kernel rotation
  • 3. 16 24 32 47 18 26 68 12 9 Input 0 1 -1 0 2 3 4 5 W1 W2 ∗ = 16 47 24 18 47 68 18 12 24 18 32 26 18 12 26 9 Im2col (input) 0 5 1 3 -1 4 0 2 x W1 W2 = 23 353 50 535 -14 354 -14 248 Rearrange 24 -13 51 -13 353 354 535 248 Now with bias 1 1 1 1 1 0 FeedForward
  • 4. 16 24 32 47 18 26 68 12 9 Input 0 0 -2.94504954e-05 0 d_y ∗ = 16 47 24 18 47 68 18 12 24 18 32 26 18 12 26 9 Im2col (input) 0 0 -2e-05 6e-06 0 0 0 0 x Im2col(d_y) -1.38417328e-03 3.00583533 e-04 -2.00263369e-03 4.34886814 e-04 -5.30108917e-04 1.15117098 e-04 -3.53405945e-04 7.67447318 e-05 Rearrange -1.38417328e-03 -5.30108917e-04 -2.00263369e-03 3.53405945e-04 d_w = input * d_y The update correspond to the Rotated kernel BackPropagation d_w 0 0 6.39539432e-06 0 -1.38417328e-03 -5.30108917e-04 -2.00263369e-03 3.53405945e-04
  • 5. 0 0 -2.94504954e-05 0 d_y d_x = d_y * w (without rotation) BackPropagation 0 0 -6.39539432e-06 0 We need full convolution And keep kernel unrotated 0 0 0 0 0 0 0 0 0 -2.94504954e-05 0 0 0 0 0 0 d_y 0 0 0 0 0 0 0 0 0 6.39539432e-06 0 0 0 0 0 0 0 1 -1 0 2 3 4 5 W1 W2 ∗ 0 1 -1 0 2 3 4 5 W1 W2 ∗
  • 6. d_x = d_y * w (without rotation) BackPropagation 0 0 0 0 0 0 0 0 0 -2.94504954e-05 0 0 0 0 0 0 d_y 0 0 0 0 0 0 0 0 0 6.39539432e-06 0 0 0 0 0 0 = 0 1 -1 0 2 3 4 5 W1 W2 ∗ 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 0 0 -1 1 0 x 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 0 2 4 3 5 x T T
  • 7. d_x = d_y * w (without rotation) BackPropagation 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 0 0 -1 1 0 x 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 0 2 4 3 5 x T T = 0 0 -0.2945e-04 0 0.2945e-04 0 0 0 0 0 0.3198e-04 0.1919-04 0 0.2558-04 0.1279-04 0 0 0
  • 8. d_x = d_y * w (without rotation) BackPropagation 0 0 -0.2945e-04 0 0.2945e-04 0 0 0 0 0 0.3198e-04 0.1919-04 0 0.2558-04 0.1279-04 0 0 0 + = 0 0.3198e-04 -0.1026e-04 0 0.5503e-04 0.1279-04 0 0 0 reshape 0 0 0 0.3198e-04 0.5503e-04 0 -0.1026e-04 0.1279-04 0
  • 9. d_x = d_y * w (without rotation) BackPropagation 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 0 -2.94e-05 0 0 0 0 0 0 0 0 -1 1 0 x 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 0 6.395e-06 0 0 0 0 0 0 0 2 4 3 5 T = In fact, we can do it in just one operation 0 0.3198e-04 -0.1026e-04 0 0.5503e-04 0.1279-04 0 0 0 Notice, every channel of delta is multiplied by the correspondent filter that generates it
  • 10. A multi-channel example 16 24 32 47 18 26 68 12 9 Input 26 57 43 24 21 12 02 11 19 18 47 21 4 6 12 81 22 13 0 1 -1 0 2 3 4 5 Filter 1 Filter 2 ∗ -2 68 24 16 18 32 22 60 23 7 46 35 42 20 81 78 (3,3,3) (2,3,2,2) Output= (2,2,2) = 2171 2170 5954 2064 13042 13575 11023 6425 Applying theano convolution (which rotates Automatically the filters)
  • 11. A multi-channel example (vectorized) 16 24 32 47 18 26 68 12 9 Input 26 57 43 24 21 12 02 11 19 18 47 21 4 6 12 81 22 13 0 1 -1 0 2 3 4 5 Filter 1 Filter 2 ∗ -2 68 24 16 18 32 22 60 23 7 46 35 42 20 81 78 (3,3,3) (2,3,2,2) = 16 47 24 18 47 68 18 12 24 18 32 26 18 12 26 9 26 24 57 21 24 2 21 11 57 21 43 12 21 11 12 19 18 4 47 6 4 81 6 22 47 6 21 12 6 22 12 13 0 60 1 32 -1 22 0 18 5 35 3 7 4 46 2 23 16 78 68 20 24 81 -2 42 x T
  • 12. A multi-channel example (vectorized) 16 47 24 18 47 68 18 12 24 18 32 26 18 12 26 9 26 24 57 21 24 2 21 11 57 21 43 12 21 11 12 19 18 4 47 6 4 81 6 22 47 6 21 12 6 22 12 13 x = T 0 60 1 32 -1 22 0 18 5 35 3 7 4 46 2 23 16 78 68 20 24 81 -2 42 2171 13042 5954 11023 2170 13575 2064 6425 Channel 1 Channel 2 Rearrange 2171 2170 5954 2064 13042 13575 11023 6425
  • 13. Backpropagation .1678 .098 .002 .246 0.5 0.67 0.21 0.487 Imagine we got the next error from an up-layer And we want to propagate it to the correspondent layer (input of convolution) We need to compute d_y * w (without rotation) But is a ‘full’ convolution, so we add 1 zero padding to d_y 0 0 0 0 0 .1678 .098 0 0 .002 .246 0 0 0 0 0 0 0 0 0 0 0.5 .67 0 0 .21 .487 0 0 0 0 0
  • 14. Backpropagation d_y-1=d_y * w (without rotation) 0 0 0 0 0 .1678 .098 0 0 .002 .246 0 0 0 0 0 0 0 0 0 0 0.5 .67 0 0 .21 .487 0 0 0 0 0 im2col 0 0 0 0 .1678 .002 0 .098 .246 0 0 0 .1678 .002 0 .098 .246 0 0 .1678 .002 0 .098 .246 0 0 0 .1678 .002 0 .098 .246 0 0 0 0 0 0 0 0 .5 .21 0 .67 .487 0 0 0 .5 .21 0 .67 .487 0 0 .5 .21 0 .67 .487 0 0 0 .5 .21 0 .67 .487 0 0 0 0
  • 15. Backpropagation d_y-1=d_y * w (without rotation) 0 0 0 0 .1678 .002 0 .098 .246 0 0 0 .1678 .002 0 .098 .246 0 0 .1678 .002 0 .098 .246 0 0 0 .1678 .002 0 .098 .246 0 0 0 0 0 0 0 0 .5 .21 0 .67 .487 0 0 0 .5 .21 0 .67 .487 0 0 .5 .21 0 .67 .487 0 0 0 .5 .21 0 .67 .487 0 0 0 0 T Notice, every channel of delta is multiplied by the correspondent filter that generates it 0 -1 1 0 2 4 3 5 -2 24 68 16 18 22 32 60 23 46 7 35 42 81 20 78 x = 30 18.339 41.6848 28.7678 11.3634 37.8224 6.722 1.476 4.336 51.0322 47.6112 98.3552 64.376 44.7626 99.7084 19.61 8.981 35.284 14.642 31.212 56.622 22.528 38.992 73.295 8.766 11.693 19.962
  • 16. Backpropagation d_y-1=d_y * w (without rotation) 30 18.339 41.6848 28.7678 11.3634 37.8224 6.722 1.476 4.336 51.0322 47.6112 98.3552 64.376 44.7626 99.7084 19.61 8.981 35.284 14.642 31.212 56.622 22.528 38.992 73.295 8.766 11.693 19.962 rearrange 30 51.0322 14.642 28.7678 64.376 22.528 6.722 19.61 8.766 18.339 47.6112 31.212 11.3634 44.7626 38.992 1.476 8.981 11.693 41.6848 98.3552 56.622 37.8224 99.7084 73.295 4.336 35.284 19.962
  • 17. Backpropagation (no vectorized) d_y-1=d_y * w (without rotation) .1678 .098 .002 .246 0.5 0.67 0.21 0.487 (2,2,2) 0 1 -1 0 2 3 4 5 Filter 1 Filter 2 ∗ -2 68 24 16 18 32 22 60 23 7 46 35 42 20 81 78 (2,3,2,2) Transpose dimensions 0 and 1 .1678 .098 .002 .246 0.5 0.67 0.21 0.487 (2,2,2) 0 1 -1 0 2 3 4 5 Filter 1 Filter 2 ∗ -2 68 24 16 18 32 22 60 23 7 46 35 42 20 81 78 (3,2,2,2) Filter 3
  • 18. Backpropagation (no vectorized, full convolution) d_y-1=d_y * w (without rotation) .1678 .098 .002 .246 0.5 0.67 0.21 0.487 (2,2,2) 0 1 -1 0 2 3 4 5 Filter 1 Filter 2 ∗ -2 68 24 16 18 32 22 60 23 7 46 35 42 20 81 78 (3,2,2,2) Filter 3 = 30 51.0322 14.642 28.7678 64.376 22.528 6.722 19.61 8.766 18.339 47.6112 31.212 11.3634 44.7626 38.992 1.476 8.981 11.693 41.6848 98.3552 56.622 37.8224 99.7084 73.295 4.336 35.284 19.962
  • 19. Backpropagation d_w=input * d_y16 24 32 47 18 26 68 12 9 Input 26 57 43 24 21 12 02 11 19 18 47 21 4 6 12 81 22 13 (3,3,3) ∗ .1678 .098 .002 .246 0.5 0.67 0.21 0.487 = d_y (2,2,2) Dimensions do not match, So it is telling us that we need to Apply both filters to any cannel of the input 16 24 32 47 18 26 68 12 9 ∗ .1678 .098 .002 .246 0.5 0.67 0.21 0.487 = 9.5588 13.5952 12.7386 7.8064 42.716 49.882 55.684 33.323 26 57 43 24 21 12 02 11 19 ∗ .1678 .098 .002 .246 0.5 0.67 0.21 0.487 = 15.1628 16.7726 8.7952 9.3958 66.457 67.564 31.847 30.103 ∗ .1678 .098 .002 .246 0.5 0.67 0.21 0.487 = 9.1104 12.9086 6.8332 5.4248 44.252 44.674 33.744 21.991 18 47 21 4 6 12 81 22 13
  • 20. Backpropagation d_w=input * d_y = 33.832 43.2764 28.367 22.627 153.425 162.12 121.275 85.417 Error associated with rotated kernel, it means We need to rotate this result to update the unrotated kernel 9.5588 13.5952 12.7386 7.8064 42.716 49.882 55.684 33.323 15.1628 16.7726 8.7952 9.3958 66.457 67.564 31.847 30.103 9.1104 12.9086 6.8332 5.4248 44.252 44.674 33.744 21.991 + +
  • 21. Backpropagation vectorized d_w=input * d_y (without rotate d_y) 16 24 32 47 18 26 68 12 9 Input 26 57 43 24 21 12 02 11 19 18 47 21 4 6 12 81 22 13 (3,3,3) ∗ .1678 .098 .002 .246 0.5 0.67 0.21 0.487 = d_y (2,2,2) Dimensions do not match, So it is telling us that we need to Apply both filters to any cannel of the input 16 47 24 18 47 68 18 12 24 18 32 26 18 12 26 9 26 24 57 21 24 2 21 11 57 21 43 12 21 11 12 19 18 4 47 6 4 81 6 22 47 6 21 12 6 22 12 13 T x .1678 0.5 .002 0.21 .098 0.67 .246 0.487 .1678 0.5 .002 0.21 .098 0.67 .246 0.487 .1678 0.5 .002 0.21 .098 0.67 .246 0.487
  • 22. Backpropagation vectorized d_w=input * d_y (without rotate d_y) 16 47 24 18 47 68 18 12 24 18 32 26 18 12 26 9 26 24 57 21 24 2 21 11 57 21 43 12 21 11 12 19 18 4 47 6 4 81 6 22 47 6 21 12 6 22 12 13 T x .1678 0.5 .002 0.21 .098 0.67 .246 0.487 .1678 0.5 .002 0.21 .098 0.67 .246 0.487 .1678 0.5 .002 0.21 .098 0.67 .246 0.487 = 33.832 153.425 28.367 121.275 43.2764 162.12 22.627 85.417 33.832 43.2764 28.367 22.627 153.425 162.12 121.275 85.417 rearrange