SlideShare a Scribd company logo
TRƯỜNG ĐẠI HỌC BÁCH KHOA – ĐẠI HỌC ĐÀ NẴNG
KHOA CƠ KHÍ GIAO THÔNG
BỘ MÔN KỸ THUẬT Ô TÔ
Giảng viên: TS. Hoàng Thắng
Email: hthang@dut.udn.vn
Tel: 070.250.9826
Kiến thức :
Trợ giảng Monica
Trí tuệ nhân tạo ứng dụng
Môn học:
Kỹ năng:
Thành lập các bước đào dạo dữ liệu trong viêc mô hình
mạng nơ tron
Sử dụng phần cứng cho TTNT
 Hiểu các kỹ thuật cơ bản của Trí tuệ nhân tạo
 Phân loại được các thuật toán học máy và các ứng dụng cơ
bản của TTNT trong lĩnh vực kỹ thuật cơ khí động lực
 Giải thích được giản đồ mạng nơ tron trong Deep Learning;
CHƯƠNG 4
Mạng nơ-tron nhân tạo
TỔNG QUAN VỀ MẠNG NƠ TRON NHÂN TẠO
How do our brains work?
 The Brain is A massively parallel information processing system.
 Our brains are a huge network of processing elements. A typical brain contains a
network of 10 billion neurons.
How do our brains work?
 A processing element
Dendrites: Input
Cell body: Processor
Synaptic: Link
Axon: Output
How do ANNs work?
An artificial neuron is an imitation of a human neuron
How do ANNs work?
• Now, let us have a look at the model of an artificial neuron.
How do ANNs work?
Output
x1
x2
xm
∑
y
Processing
Input
∑= X1+X2 + ….+Xm =y
. . . . . . . . . . .
.
How do ANNs work?
Not all inputs are equal
Output
x1
x2
xm
∑
y
Processing
Input
∑= X1w1+X2w2 + ….+Xmwm
=y
w1
w2
wm
weights
. . . . . . . . . . .
.
. . . .
.
How do ANNs work?
The signal is not passed down to the
next neuron verbatim
Transfer Function
(Activation Function)
Output
x1
x2
xm
∑
y
Processing
Input
w1
w2
wm
weights
. . . . . . . . . . .
.
f(vk)
. . . .
.
The output is a function of the input, that is
affected by the weights, and the transfer
functions
Convolutional
Neural
Network
Convolutional neural networks
were inspired by the layered
architecture of the human
visual cortex, and below are
some key similarities and
differences:
The importance of CNNs
Key Components of a CNN
The convolutional neural network is made of four main parts.
But how do CNNs Learn with those parts?
They help the CNNs mimic how the human brain operates to
recognize patterns and features in images:
•Convolutional layers
•Rectified Linear Unit (ReLU for short)
•Pooling layers
•Fully connected layers
This section dives into the definition of each one of these components through the example
of the following example of classification of a handwritten digit.
Digital Images
• Input array: an image’s height × width × 3 (RGB)
• Value of each pixel: 0 - 255
Digital Images
How to convert ?
Classification, Localization,
Detection, Segmentation
Convolution Theorem
• Fourier transform of a convolution of two signals is the
pointwise product of their Fourier transforms
• Convolution is usually introduced with its formal definition:
dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf
Convolution layers
Convolution layers
Convolution layers
Example: A Curve Filter
Scan the Image to Detect an Edge
Edge Detected!
Continue Scanning (No edge)
Spatial Hierarchy of Features
Activation function
A ReLU activation function is applied after each convolution
operation. This function helps the network learn non-linear
relationships between the features in the image, hence making
the network more robust for identifying different patterns. It also
helps to mitigate the vanishing gradient problems.
Types & Use Cases
Pooling layer
The goal of the pooling layer is to pull the most significant features from the
convoluted matrix. This is done by applying some aggregation operations, which
reduce the dimension of the feature map (convoluted matrix), hence reducing the
memory used while training the network. Pooling is also relevant for mitigating
overfitting.
Create First ConvNet
• Create a CNN to classify MNIST digits
from keras import layers from
keras import models
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
Model Summary
• model.summary()
Layer (type) Output Shape Param #
================================================================
320
conv2d_1 (Conv2D) (None, 26, 26,
32)
13, 13, 32) 0
maxpooling2d_1 (MaxPooling2D)
(None,
18496
conv2d_2 (Conv2D) (None, 11, 11,
64)
5, 5, 64) 0
maxpooling2d_2 (MaxPooling2D)
(None,
conv2d_3 (Conv2D) (None, 3, 3, 64) 36928
================================================================
Feature Map
• Outputs of a Convolution Layer is also called as Feature Map
=>layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))
 Receive a 28x28 input image and computes 32 filters over it
 Each filter has size 3x3
dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf
Add a Classifier on Top of
ConvNet
model.add(layers.Flatten()) model.add(layers.Dense(64,
activation='relu')) model.add(layers.Dense(10,
activation='softmax'))
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 26, 26, 32) 320
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32) 0
conv2d_2 (Conv2D) (None, 11, 11, 64) 18496
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64) 0
conv2d_3 (Conv2D) (None, 3, 3, 64) 36928
flatten_1 (Flatten) (None, 576) 0
dense_1 (Dense) (None, 64) 36928
dense_2 (Dense) (None, 10) 650
=================================================================
Total params: 93,322 Trainable params: 93,322 Non-trainable params: 0
Padding
• Padding a 5x5 input to extract 25 3x3 patches
Stride=1
Stride=2
Max Pooling
• Downsampling an image
• Better than average pooling and strides
Train a Model to Classify Cats & Dogs
• www.kaggle.com/c/dogs-vs-cats/data
• 2000 cat and 2000 dog images
Create a CNN Model for Binary Classification
from keras import layers from keras
import models model =
models.Sequential()
model.add(layers.Conv2D(32, (3, 3),
activation='relu', input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3),
activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3),
activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten()) model.add(layers.Dense(512,
activation='relu')) model.add(layers.Dense(1,
activation='sigmoid'))
Image Generator
1. Read the picture files.
2. Decode the JPEG
content to RGB grids
of pixels.
3. Convert these into
floating- point
tensors.
4. Rescale the pixel values
(between 0 and 255) to
the [0,
1] interval
from keras.preprocessing.image import
ImageDataGenerator
train_datagen =
ImageDataGenerator(rescale=1./255)
test_datagen =
ImageDataGenerator(rescale=1./255)
train_generator =
train_datagen.flow_from_directory(
train_dir, target_size=(150,
150) batch_size=20,
class_mode='binary')
validation_generator =
test_datagen.flow_from_directory(
validation_dir,
target_size=(150, 150),
batch_size=20,
class_mode='binary')
Python Generator
• Use yield operator
• Note that the generator loops endlessly
Fitting the Model using a Batch
Generator
history = model.fit_generator(
train_generator,
steps_per_epoch=100, epochs=30,
validation_data=validation_generator,
validation_steps=50)
# Save the model
model.save('cats_and_dogs_small_1.h5')
Data Augmentation
Data Augmentation via ImageDataGenerator
• rotation_range is a value in degrees (0–180)
• width_shift and height_shift are ranges (as a fraction of total width or height) within
which to randomly translate pictures vertically or horizontally.
• shear_range is for randomly applying shearing transformations.
• zoom_range is for randomly zooming inside pictures.
• horizontal_flip is for randomly flipping half the images horizontally
• fill_mode is the strategy used for filling in newly created pixels, which can appear
after a rotation or a width/height shift.
datagen = ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
Use Pre-trained Models
• Xception
• VGG16
• VGG19
• ResNet, ResNetV2,
ResNeXt
• InceptionV3
• InceptionResNetV2
• MobileNet
• MobileNetV2
• DenseNet
• NASNet
Example: Using Pre-trained
VGG16
• weights specifies the weight checkpoint from which to initialize the model.
• include_top refers to including (or not) the densely connected classifier on
top of the network (1,000 classes output).
• input_shape the network will be able to process inputs of any size it the
argument is omitted.
from keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
include_top=False,
input_shape=(150, 150, 3))
Adding a Classifier on Top of a
Pre-trained Model
from keras import models from
keras import layers
model = models.Sequential() model.add(conv_base)
model.add(layers.Flatten()) model.add(layers.Dense(256,
activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
Layer (type) Output Shape Param #
================================================================
vgg16 (Model) (None, 4, 4, 512) 14714688
flatten_1 (Flatten) (None, 8192) 0
dense_1 (Dense) (None, 256) 2097408
dense_2 (Dense) (None, 1) 257
================================================================
Total params: 16,812,353
Trainable params: 16,812,353
Non-trainable params: 0
Freeze Trainable Parameters
• conv_base.trainable = False
Fine-Tuning Top Few Layers
• Freezing all layers up to a specific one
conv_base.trainable = True
set_trainable = False
for layer in conv_base.layers:
if layer.name == 'block5_conv1':
set_trainable = True
if set_trainable:
layer.trainable = True
else:
layer.trainable = False
Summary
• Convnets are the best for Computer Vision (and maybe all
the other tasks)
• Data augmentation is a powerful way to fight overfitting
• We can use pre-trained model for feature extraction
• We can further improve the pre-trained model on our
dataset by fine-tuning
Visualizing What Convnets Learn
1. Visualizing Intermediate ConvNet Outputs (Intermediate Activations)
 Understand how successive convnet layers transform their input
 Get a first idea of the meaning of individual convnet filters
2. Visualizing ConvNets Filters
 Understand precisely what visual pattern or concept each filter in a convnet is
receptive to
3. Visualizing Heatmaps of Class Activation in an Image
 See which parts of an image were identified as belonging to a given class
 Can localize objects in images.
1. Visualizing Intermediate Activations
• Show the feature maps that are output by various
convolution and pooling layers in a network
from keras.preprocessing import image import
numpy as np
img = image.load_img('./test1/1700.jpg', target_size=(150, 150)) img_tensor =
image.img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)/255.
from keras import models
model = load_model('cats_and_dogs_small_1.h5') layer_outputs =
[layer.output for layer in model.layers[:8]]
activation_model = models.Model(inputs=model.input, outputs=layer_outputs) activations =
activation_model.predict(img_tensor)
first_layer_activation = activations[0]
import matplotlib.pyplot as plt plt.matshow(first_layer_activation[0, :, :, 3],
cmap='viridis')
Visualizing Every Channel in
Every Intermediate Activation
dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf
Things to Note
• The first layer acts as a collection of various edge detectors
• As you go deeper, the activations become increasingly
abstract and less visually interpretable
• The sparsity of the activations increases with the depth of
the layer, more and more filters are blank
2. Visualizing ConvNet Filters
• Gradient ascent: applying gradient descent to the value of the input
image of a convnet so as to maximize the response of a specific filter
Loss Maximization Via Stochastic Gradient Descent
Convert a Tensor into a Valid
Image
Visualizing ConvNet Filters
model = VGG16(weights='imagenet', include_top=False) layer_name
= 'block3_conv1'
filter_index = 0
def generate_pattern(layer_name, filter_index, size=150): layer_output =
model.get_layer(layer_name).output loss = K.mean(layer_output[:, :, :,
filter_index])
grads = K.gradients(loss, model.input)[0] # Keep only the first tensor
grads /= (K.sqrt(K.mean(K.square(grads))) + 1e-5) # 1e-5 avoids divided by zero # Fetching Numpy output
values given Numpy input values
iterate = K.function([model.input], [loss, grads])
loss_value, grads_value = iterate([np.zeros((1, 150, 150, 3))]) # Loss maximization via
stochastic gradient descent input_img_data = np.random.random((1, size, size, 3)) *
20 + 128. step = 1.
for i in range(40):
loss_value, grads_value = iterate([input_img_data])
input_img_data += grads_value * step img =
input_img_data[0]
return deprocess_image(img)
Filter Patterns for Each Layer
3. Visualizing Heatmaps of Class
Activation
• Ramprasaath R. Selvaraju et al., “Grad-CAM: Visual Explanations from Deep Networks via
Gradient-based Localization.” arXiv (2017), https://arxiv.org/abs/1610.02391.
Evolution of
CNN
Convolutional Neural Network
(LeNet-5)
• https://medium.com/@sh.tsang/paper-brief-review-of-lenet-1-lenet-4-lenet-5-
boosted-lenet-4-image-classification-1f5f809dbf17
dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf
Error Rate on ImageNet
Challenge
dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf
AlexNet (2012)
• AlexNet significantly outperformed previous models (e.g. SVM)
• Include convolutions, max-pooling, dropout, ReLU, SGD with momentum
• Use 2 Nvidia GeForce GTX 580 GPU
ZF Net (2013)
• Parameter tuning of AlexNet
GoogLeNet (2014)
• Achieved a top-5 error rate of 6.67%!
This was very close to human level
performance
• Propose inception module, batch
normalization, image distortions, and
RMSprop
• 22 layers but reduced parameters
from 60 million (AlexNet) to 4 million
Inception Module
VGG Net (2014)
• Very uniform architecture
• Preferred choice in the
community for extracting
features from images
dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf
ResNet (2015)
• Residual Neural Network
• Proposed “skip connection”
• 152-layer with 3.57% error rate
Statistics
Summary Table
Dens
eNet
(201
6)
dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf
dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf
References
• Francois Chollet, “Deep Learning with Python,” Chapter 5.
• Adit Deshpande, A Beginner's Guide To Understanding Convolutional
Neural Networks.
• Machine Learning Guru. Understanding Convolutional Layers in
Convolutional Neural Networks (CNNs)
• CNN Architectures: LeNet, AlexNet, VGG, GoogLeNet, ResNet and more
….
• Wikipedia. Convolution
• https://cv-tricks.com/cnn/understand-resnet-alexnet-vgg-inception/
• http://neuralnetworksanddeeplearning.com/
• Stanford’s CS231N

More Related Content

PPTX
Deep Neural Networks for Computer Vision
Alex Conway
 
PPTX
PyConZA'17 Deep Learning for Computer Vision
Alex Conway
 
PPTX
build a Convolutional Neural Network (CNN) using TensorFlow in Python
Kv Sagar
 
PDF
Introduction to Convolutional Neural Networks
Hannes Hapke
 
PPTX
intro-to-cnn-April_2020.pptx
ssuser3aa461
 
PDF
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Jedha Bootcamp
 
PPTX
Mnist report ppt
RaghunandanJairam
 
PDF
Mnist report
RaghunandanJairam
 
Deep Neural Networks for Computer Vision
Alex Conway
 
PyConZA'17 Deep Learning for Computer Vision
Alex Conway
 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
Kv Sagar
 
Introduction to Convolutional Neural Networks
Hannes Hapke
 
intro-to-cnn-April_2020.pptx
ssuser3aa461
 
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Jedha Bootcamp
 
Mnist report ppt
RaghunandanJairam
 
Mnist report
RaghunandanJairam
 

Similar to dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf (20)

PPTX
cnn ppt.pptx
rohithprabhas1
 
PPTX
Convolutional Neural Networks for Computer vision Applications
Alex Conway
 
PDF
Practical Deep Learning Using Tensor Flow - Sandeep Kath
Sandeep Kath
 
PPTX
Deep Learning for Computer Vision - PyconDE 2017
Alex Conway
 
PPTX
Automatic Attendace using convolutional neural network Face Recognition
vatsal199567
 
PPTX
Deep learning requirement and notes for novoice
AmmarAhmedSiddiqui2
 
PDF
_AI_Stanford_Super_#DeepLearning_Cheat_Sheet!_😊🙃😀🙃😊.pdf
SongsDrizzle
 
PDF
super-cheatsheet-deep-learning.pdf
DeanSchoolofElectron
 
PDF
CNN Algorithm
georgejustymirobi1
 
PDF
dl-unit-4-deep-learning deep-learning.pdf
nandan543979
 
PPTX
Deep learning
Aman Kamboj
 
PDF
PyDresden 20170824 - Deep Learning for Computer Vision
Alex Conway
 
PDF
Classification case study + intro to cnn
Vincent Tatan
 
PDF
NLP and Deep Learning for non_experts
Sanghamitra Deb
 
PDF
Overview of Convolutional Neural Networks
ananth
 
PDF
Deep Learning for Computer Vision - ExecutiveML
Alex Conway
 
PPTX
Introduction to computer vision
Marcin Jedyk
 
PPTX
Machine Learning - Convolutional Neural Network
Richard Kuo
 
PDF
Eye deep
sveitser
 
cnn ppt.pptx
rohithprabhas1
 
Convolutional Neural Networks for Computer vision Applications
Alex Conway
 
Practical Deep Learning Using Tensor Flow - Sandeep Kath
Sandeep Kath
 
Deep Learning for Computer Vision - PyconDE 2017
Alex Conway
 
Automatic Attendace using convolutional neural network Face Recognition
vatsal199567
 
Deep learning requirement and notes for novoice
AmmarAhmedSiddiqui2
 
_AI_Stanford_Super_#DeepLearning_Cheat_Sheet!_😊🙃😀🙃😊.pdf
SongsDrizzle
 
super-cheatsheet-deep-learning.pdf
DeanSchoolofElectron
 
CNN Algorithm
georgejustymirobi1
 
dl-unit-4-deep-learning deep-learning.pdf
nandan543979
 
Deep learning
Aman Kamboj
 
PyDresden 20170824 - Deep Learning for Computer Vision
Alex Conway
 
Classification case study + intro to cnn
Vincent Tatan
 
NLP and Deep Learning for non_experts
Sanghamitra Deb
 
Overview of Convolutional Neural Networks
ananth
 
Deep Learning for Computer Vision - ExecutiveML
Alex Conway
 
Introduction to computer vision
Marcin Jedyk
 
Machine Learning - Convolutional Neural Network
Richard Kuo
 
Eye deep
sveitser
 
Ad

Recently uploaded (20)

PDF
PC160LC-7K-KA KOMATSU CRAWLER EXCAVATOR PARTS MANUAL SN K40001-UP
Heavy Equipment Manual
 
PDF
Hitachi 125US 135US EXCAVATOR Service Repair Manual.pdf
Service Repair Manual
 
PPT
Amine.pptupiogtoitgo9ptg9ptg89p8t9p9ptp98
tejaspagar394
 
PDF
PC1400-1 KOMATSU Hydraulic Mining Shovels Parts Manual
Heavy Equipment Manual
 
PPTX
1 food management_ttttttR Chalasani.pptx
srinidhi24bba7002
 
PPTX
INTRODUCTION TO HUMAN RESOURCE MANAGEMEN
FahadBinImtiaz
 
PPTX
Soffit_Panel_India_Presentation.pptx____
interviewquestion6
 
PDF
deloitte-nl-integrated-annual-report-2018-2019.pdf
dsoham206
 
PDF
PC228USLC-3E0 Komatsu Hydraulic Excavator Parts Manual SN 40001-UP
Heavy Equipment Manual
 
PPTX
Presentation Homologation Kendaraan Roda 3
delapanpaduprima
 
PPTX
Have 10 Thousand Dollars Lying Around? You Can Buy Any One Of These Project Cars
jennifermiller8137
 
PPTX
STRATEGIC HRM.pptxkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
khushigulati2325
 
PDF
SAA4D95LE-7A KOMATSU ENGINE PARTS MANUAL SN 800001-UP (For PC138US-11PC138USL...
Heavy Equipment Manual
 
PPTX
Detroit Business Travel Made Easy with Detroit DTW Cars
Detroit DTW Car
 
PPTX
RTM_Module1_Summary_tyiuwyPresentation.pptx
DeepakKumar311204
 
PPTX
What is the most common reason for check engine light on Mercedes
Protech Automotive Services
 
PPTX
July 2025 - Automobile_Industry_Trends_Presentation.pptx
savithrir7
 
PPTX
托莱多大学文凭办理|办理UT毕业证i20购买学位证书电子版
xxxihn4u
 
PPT
Operational Risk and its importance an d
icuphamid
 
PDF
Hitachi 130 EXCAVATOR Repair Manual Download
Service Repair Manual
 
PC160LC-7K-KA KOMATSU CRAWLER EXCAVATOR PARTS MANUAL SN K40001-UP
Heavy Equipment Manual
 
Hitachi 125US 135US EXCAVATOR Service Repair Manual.pdf
Service Repair Manual
 
Amine.pptupiogtoitgo9ptg9ptg89p8t9p9ptp98
tejaspagar394
 
PC1400-1 KOMATSU Hydraulic Mining Shovels Parts Manual
Heavy Equipment Manual
 
1 food management_ttttttR Chalasani.pptx
srinidhi24bba7002
 
INTRODUCTION TO HUMAN RESOURCE MANAGEMEN
FahadBinImtiaz
 
Soffit_Panel_India_Presentation.pptx____
interviewquestion6
 
deloitte-nl-integrated-annual-report-2018-2019.pdf
dsoham206
 
PC228USLC-3E0 Komatsu Hydraulic Excavator Parts Manual SN 40001-UP
Heavy Equipment Manual
 
Presentation Homologation Kendaraan Roda 3
delapanpaduprima
 
Have 10 Thousand Dollars Lying Around? You Can Buy Any One Of These Project Cars
jennifermiller8137
 
STRATEGIC HRM.pptxkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
khushigulati2325
 
SAA4D95LE-7A KOMATSU ENGINE PARTS MANUAL SN 800001-UP (For PC138US-11PC138USL...
Heavy Equipment Manual
 
Detroit Business Travel Made Easy with Detroit DTW Cars
Detroit DTW Car
 
RTM_Module1_Summary_tyiuwyPresentation.pptx
DeepakKumar311204
 
What is the most common reason for check engine light on Mercedes
Protech Automotive Services
 
July 2025 - Automobile_Industry_Trends_Presentation.pptx
savithrir7
 
托莱多大学文凭办理|办理UT毕业证i20购买学位证书电子版
xxxihn4u
 
Operational Risk and its importance an d
icuphamid
 
Hitachi 130 EXCAVATOR Repair Manual Download
Service Repair Manual
 
Ad

dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf

  • 1. TRƯỜNG ĐẠI HỌC BÁCH KHOA – ĐẠI HỌC ĐÀ NẴNG KHOA CƠ KHÍ GIAO THÔNG BỘ MÔN KỸ THUẬT Ô TÔ Giảng viên: TS. Hoàng Thắng Email: [email protected] Tel: 070.250.9826 Kiến thức : Trợ giảng Monica Trí tuệ nhân tạo ứng dụng Môn học: Kỹ năng: Thành lập các bước đào dạo dữ liệu trong viêc mô hình mạng nơ tron Sử dụng phần cứng cho TTNT  Hiểu các kỹ thuật cơ bản của Trí tuệ nhân tạo  Phân loại được các thuật toán học máy và các ứng dụng cơ bản của TTNT trong lĩnh vực kỹ thuật cơ khí động lực  Giải thích được giản đồ mạng nơ tron trong Deep Learning;
  • 3. TỔNG QUAN VỀ MẠNG NƠ TRON NHÂN TẠO
  • 4. How do our brains work?  The Brain is A massively parallel information processing system.  Our brains are a huge network of processing elements. A typical brain contains a network of 10 billion neurons.
  • 5. How do our brains work?  A processing element Dendrites: Input Cell body: Processor Synaptic: Link Axon: Output
  • 6. How do ANNs work? An artificial neuron is an imitation of a human neuron
  • 7. How do ANNs work? • Now, let us have a look at the model of an artificial neuron.
  • 8. How do ANNs work? Output x1 x2 xm ∑ y Processing Input ∑= X1+X2 + ….+Xm =y . . . . . . . . . . . .
  • 9. How do ANNs work? Not all inputs are equal Output x1 x2 xm ∑ y Processing Input ∑= X1w1+X2w2 + ….+Xmwm =y w1 w2 wm weights . . . . . . . . . . . . . . . . .
  • 10. How do ANNs work? The signal is not passed down to the next neuron verbatim Transfer Function (Activation Function) Output x1 x2 xm ∑ y Processing Input w1 w2 wm weights . . . . . . . . . . . . f(vk) . . . . .
  • 11. The output is a function of the input, that is affected by the weights, and the transfer functions
  • 13. Convolutional neural networks were inspired by the layered architecture of the human visual cortex, and below are some key similarities and differences: The importance of CNNs
  • 14. Key Components of a CNN The convolutional neural network is made of four main parts. But how do CNNs Learn with those parts? They help the CNNs mimic how the human brain operates to recognize patterns and features in images: •Convolutional layers •Rectified Linear Unit (ReLU for short) •Pooling layers •Fully connected layers
  • 15. This section dives into the definition of each one of these components through the example of the following example of classification of a handwritten digit.
  • 16. Digital Images • Input array: an image’s height × width × 3 (RGB) • Value of each pixel: 0 - 255
  • 19. Convolution Theorem • Fourier transform of a convolution of two signals is the pointwise product of their Fourier transforms • Convolution is usually introduced with its formal definition:
  • 25. Scan the Image to Detect an Edge
  • 29. Activation function A ReLU activation function is applied after each convolution operation. This function helps the network learn non-linear relationships between the features in the image, hence making the network more robust for identifying different patterns. It also helps to mitigate the vanishing gradient problems.
  • 30. Types & Use Cases
  • 31. Pooling layer The goal of the pooling layer is to pull the most significant features from the convoluted matrix. This is done by applying some aggregation operations, which reduce the dimension of the feature map (convoluted matrix), hence reducing the memory used while training the network. Pooling is also relevant for mitigating overfitting.
  • 32. Create First ConvNet • Create a CNN to classify MNIST digits from keras import layers from keras import models model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu'))
  • 33. Model Summary • model.summary() Layer (type) Output Shape Param # ================================================================ 320 conv2d_1 (Conv2D) (None, 26, 26, 32) 13, 13, 32) 0 maxpooling2d_1 (MaxPooling2D) (None, 18496 conv2d_2 (Conv2D) (None, 11, 11, 64) 5, 5, 64) 0 maxpooling2d_2 (MaxPooling2D) (None, conv2d_3 (Conv2D) (None, 3, 3, 64) 36928 ================================================================
  • 34. Feature Map • Outputs of a Convolution Layer is also called as Feature Map =>layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))  Receive a 28x28 input image and computes 32 filters over it  Each filter has size 3x3
  • 36. Add a Classifier on Top of ConvNet model.add(layers.Flatten()) model.add(layers.Dense(64, activation='relu')) model.add(layers.Dense(10, activation='softmax')) Layer (type) Output Shape Param # ================================================================= conv2d_1 (Conv2D) (None, 26, 26, 32) 320 max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32) 0 conv2d_2 (Conv2D) (None, 11, 11, 64) 18496 max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64) 0 conv2d_3 (Conv2D) (None, 3, 3, 64) 36928 flatten_1 (Flatten) (None, 576) 0 dense_1 (Dense) (None, 64) 36928 dense_2 (Dense) (None, 10) 650 ================================================================= Total params: 93,322 Trainable params: 93,322 Non-trainable params: 0
  • 37. Padding • Padding a 5x5 input to extract 25 3x3 patches
  • 40. Max Pooling • Downsampling an image • Better than average pooling and strides
  • 41. Train a Model to Classify Cats & Dogs • www.kaggle.com/c/dogs-vs-cats/data • 2000 cat and 2000 dog images
  • 42. Create a CNN Model for Binary Classification from keras import layers from keras import models model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(128, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(128, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Flatten()) model.add(layers.Dense(512, activation='relu')) model.add(layers.Dense(1, activation='sigmoid'))
  • 43. Image Generator 1. Read the picture files. 2. Decode the JPEG content to RGB grids of pixels. 3. Convert these into floating- point tensors. 4. Rescale the pixel values (between 0 and 255) to the [0, 1] interval from keras.preprocessing.image import ImageDataGenerator train_datagen = ImageDataGenerator(rescale=1./255) test_datagen = ImageDataGenerator(rescale=1./255) train_generator = train_datagen.flow_from_directory( train_dir, target_size=(150, 150) batch_size=20, class_mode='binary') validation_generator = test_datagen.flow_from_directory( validation_dir, target_size=(150, 150), batch_size=20, class_mode='binary')
  • 44. Python Generator • Use yield operator • Note that the generator loops endlessly
  • 45. Fitting the Model using a Batch Generator history = model.fit_generator( train_generator, steps_per_epoch=100, epochs=30, validation_data=validation_generator, validation_steps=50) # Save the model model.save('cats_and_dogs_small_1.h5')
  • 47. Data Augmentation via ImageDataGenerator • rotation_range is a value in degrees (0–180) • width_shift and height_shift are ranges (as a fraction of total width or height) within which to randomly translate pictures vertically or horizontally. • shear_range is for randomly applying shearing transformations. • zoom_range is for randomly zooming inside pictures. • horizontal_flip is for randomly flipping half the images horizontally • fill_mode is the strategy used for filling in newly created pixels, which can appear after a rotation or a width/height shift. datagen = ImageDataGenerator( rotation_range=40, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.2, zoom_range=0.2, horizontal_flip=True, fill_mode='nearest')
  • 48. Use Pre-trained Models • Xception • VGG16 • VGG19 • ResNet, ResNetV2, ResNeXt • InceptionV3 • InceptionResNetV2 • MobileNet • MobileNetV2 • DenseNet • NASNet
  • 49. Example: Using Pre-trained VGG16 • weights specifies the weight checkpoint from which to initialize the model. • include_top refers to including (or not) the densely connected classifier on top of the network (1,000 classes output). • input_shape the network will be able to process inputs of any size it the argument is omitted. from keras.applications import VGG16 conv_base = VGG16(weights='imagenet', include_top=False, input_shape=(150, 150, 3))
  • 50. Adding a Classifier on Top of a Pre-trained Model from keras import models from keras import layers model = models.Sequential() model.add(conv_base) model.add(layers.Flatten()) model.add(layers.Dense(256, activation='relu')) model.add(layers.Dense(1, activation='sigmoid')) Layer (type) Output Shape Param # ================================================================ vgg16 (Model) (None, 4, 4, 512) 14714688 flatten_1 (Flatten) (None, 8192) 0 dense_1 (Dense) (None, 256) 2097408 dense_2 (Dense) (None, 1) 257 ================================================================ Total params: 16,812,353 Trainable params: 16,812,353 Non-trainable params: 0
  • 51. Freeze Trainable Parameters • conv_base.trainable = False
  • 52. Fine-Tuning Top Few Layers • Freezing all layers up to a specific one conv_base.trainable = True set_trainable = False for layer in conv_base.layers: if layer.name == 'block5_conv1': set_trainable = True if set_trainable: layer.trainable = True else: layer.trainable = False
  • 53. Summary • Convnets are the best for Computer Vision (and maybe all the other tasks) • Data augmentation is a powerful way to fight overfitting • We can use pre-trained model for feature extraction • We can further improve the pre-trained model on our dataset by fine-tuning
  • 54. Visualizing What Convnets Learn 1. Visualizing Intermediate ConvNet Outputs (Intermediate Activations)  Understand how successive convnet layers transform their input  Get a first idea of the meaning of individual convnet filters 2. Visualizing ConvNets Filters  Understand precisely what visual pattern or concept each filter in a convnet is receptive to 3. Visualizing Heatmaps of Class Activation in an Image  See which parts of an image were identified as belonging to a given class  Can localize objects in images.
  • 55. 1. Visualizing Intermediate Activations • Show the feature maps that are output by various convolution and pooling layers in a network from keras.preprocessing import image import numpy as np img = image.load_img('./test1/1700.jpg', target_size=(150, 150)) img_tensor = image.img_to_array(img) img_tensor = np.expand_dims(img_tensor, axis=0)/255. from keras import models model = load_model('cats_and_dogs_small_1.h5') layer_outputs = [layer.output for layer in model.layers[:8]] activation_model = models.Model(inputs=model.input, outputs=layer_outputs) activations = activation_model.predict(img_tensor) first_layer_activation = activations[0] import matplotlib.pyplot as plt plt.matshow(first_layer_activation[0, :, :, 3], cmap='viridis')
  • 56. Visualizing Every Channel in Every Intermediate Activation
  • 58. Things to Note • The first layer acts as a collection of various edge detectors • As you go deeper, the activations become increasingly abstract and less visually interpretable • The sparsity of the activations increases with the depth of the layer, more and more filters are blank
  • 59. 2. Visualizing ConvNet Filters • Gradient ascent: applying gradient descent to the value of the input image of a convnet so as to maximize the response of a specific filter Loss Maximization Via Stochastic Gradient Descent
  • 60. Convert a Tensor into a Valid Image
  • 61. Visualizing ConvNet Filters model = VGG16(weights='imagenet', include_top=False) layer_name = 'block3_conv1' filter_index = 0 def generate_pattern(layer_name, filter_index, size=150): layer_output = model.get_layer(layer_name).output loss = K.mean(layer_output[:, :, :, filter_index]) grads = K.gradients(loss, model.input)[0] # Keep only the first tensor grads /= (K.sqrt(K.mean(K.square(grads))) + 1e-5) # 1e-5 avoids divided by zero # Fetching Numpy output values given Numpy input values iterate = K.function([model.input], [loss, grads]) loss_value, grads_value = iterate([np.zeros((1, 150, 150, 3))]) # Loss maximization via stochastic gradient descent input_img_data = np.random.random((1, size, size, 3)) * 20 + 128. step = 1. for i in range(40): loss_value, grads_value = iterate([input_img_data]) input_img_data += grads_value * step img = input_img_data[0] return deprocess_image(img)
  • 62. Filter Patterns for Each Layer
  • 63. 3. Visualizing Heatmaps of Class Activation • Ramprasaath R. Selvaraju et al., “Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization.” arXiv (2017), https://arxiv.org/abs/1610.02391.
  • 65. Convolutional Neural Network (LeNet-5) • https://medium.com/@sh.tsang/paper-brief-review-of-lenet-1-lenet-4-lenet-5- boosted-lenet-4-image-classification-1f5f809dbf17
  • 67. Error Rate on ImageNet Challenge
  • 69. AlexNet (2012) • AlexNet significantly outperformed previous models (e.g. SVM) • Include convolutions, max-pooling, dropout, ReLU, SGD with momentum • Use 2 Nvidia GeForce GTX 580 GPU
  • 70. ZF Net (2013) • Parameter tuning of AlexNet
  • 71. GoogLeNet (2014) • Achieved a top-5 error rate of 6.67%! This was very close to human level performance • Propose inception module, batch normalization, image distortions, and RMSprop • 22 layers but reduced parameters from 60 million (AlexNet) to 4 million
  • 73. VGG Net (2014) • Very uniform architecture • Preferred choice in the community for extracting features from images
  • 75. ResNet (2015) • Residual Neural Network • Proposed “skip connection” • 152-layer with 3.57% error rate
  • 81. References • Francois Chollet, “Deep Learning with Python,” Chapter 5. • Adit Deshpande, A Beginner's Guide To Understanding Convolutional Neural Networks. • Machine Learning Guru. Understanding Convolutional Layers in Convolutional Neural Networks (CNNs) • CNN Architectures: LeNet, AlexNet, VGG, GoogLeNet, ResNet and more …. • Wikipedia. Convolution • https://cv-tricks.com/cnn/understand-resnet-alexnet-vgg-inception/ • http://neuralnetworksanddeeplearning.com/ • Stanford’s CS231N