dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf

TRƯỜNG ĐẠI HỌC BÁCH KHOA – ĐẠI HỌC ĐÀ NẴNG
KHOA CƠ KHÍ GIAO THÔNG
BỘ MÔN KỸ THUẬT Ô TÔ
Giảng viên: TS. Hoàng Thắng
Email: hthang@dut.udn.vn
Tel: 070.250.9826
Kiến thức :
Trợ giảng Monica
Trí tuệ nhân tạo ứng dụng
Môn học:
Kỹ năng:
Thành lập các bước đào dạo dữ liệu trong viêc mô hình
mạng nơ tron
Sử dụng phần cứng cho TTNT
 Hiểu các kỹ thuật cơ bản của Trí tuệ nhân tạo
 Phân loại được các thuật toán học máy và các ứng dụng cơ
bản của TTNT trong lĩnh vực kỹ thuật cơ khí động lực
 Giải thích được giản đồ mạng nơ tron trong Deep Learning;

CHƯƠNG 4
Mạng nơ-tron nhân tạo

TỔNG QUAN VỀ MẠNG NƠ TRON NHÂN TẠO

How do our brains work?
 The Brain is A massively parallel information processing system.
 Our brains are a huge network of processing elements. A typical brain contains a
network of 10 billion neurons.

How do our brains work?
 A processing element
Dendrites: Input
Cell body: Processor
Synaptic: Link
Axon: Output

How do ANNs work?
An artificial neuron is an imitation of a human neuron

How do ANNs work?
• Now, let us have a look at the model of an artificial neuron.

How do ANNs work?
Output
x1
x2
xm
∑
y
Processing
Input
∑= X1+X2 + ….+Xm =y
. . . . . . . . . . .
.

How do ANNs work?
Not all inputs are equal
Output
x1
x2
xm
∑
y
Processing
Input
∑= X1w1+X2w2 + ….+Xmwm
=y
w1
w2
wm
weights
. . . . . . . . . . .
.
. . . .
.

How do ANNs work?
The signal is not passed down to the
next neuron verbatim
Transfer Function
(Activation Function)
Output
x1
x2
xm
∑
y
Processing
Input
w1
w2
wm
weights
. . . . . . . . . . .
.
f(vk)
. . . .
.

The output is a function of the input, that is
affected by the weights, and the transfer
functions

Convolutional neural networks
were inspired by the layered
architecture of the human
visual cortex, and below are
some key similarities and
differences:
The importance of CNNs

Key Components of a CNN
The convolutional neural network is made of four main parts.
But how do CNNs Learn with those parts?
They help the CNNs mimic how the human brain operates to
recognize patterns and features in images:
•Convolutional layers
•Rectified Linear Unit (ReLU for short)
•Pooling layers
•Fully connected layers

This section dives into the definition of each one of these components through the example
of the following example of classification of a handwritten digit.

Digital Images
• Input array: an image’s height × width × 3 (RGB)
• Value of each pixel: 0 - 255

Digital Images
How to convert ?

Classification, Localization,
Detection, Segmentation

Convolution Theorem
• Fourier transform of a convolution of two signals is the
pointwise product of their Fourier transforms
• Convolution is usually introduced with its formal definition:

dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf

Scan the Image to Detect an Edge

Activation function
A ReLU activation function is applied after each convolution
operation. This function helps the network learn non-linear
relationships between the features in the image, hence making
the network more robust for identifying different patterns. It also
helps to mitigate the vanishing gradient problems.

Pooling layer
The goal of the pooling layer is to pull the most significant features from the
convoluted matrix. This is done by applying some aggregation operations, which
reduce the dimension of the feature map (convoluted matrix), hence reducing the
memory used while training the network. Pooling is also relevant for mitigating
overfitting.

Create First ConvNet
• Create a CNN to classify MNIST digits
from keras import layers from
keras import models
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

Model Summary
• model.summary()
Layer (type) Output Shape Param #
================================================================
320
conv2d_1 (Conv2D) (None, 26, 26,
32)
13, 13, 32) 0
maxpooling2d_1 (MaxPooling2D)
(None,
18496
conv2d_2 (Conv2D) (None, 11, 11,
64)
5, 5, 64) 0
maxpooling2d_2 (MaxPooling2D)
(None,
conv2d_3 (Conv2D) (None, 3, 3, 64) 36928
================================================================

Feature Map
• Outputs of a Convolution Layer is also called as Feature Map
=>layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))
 Receive a 28x28 input image and computes 32 filters over it
 Each filter has size 3x3

Add a Classifier on Top of
ConvNet
model.add(layers.Flatten()) model.add(layers.Dense(64,
activation='relu')) model.add(layers.Dense(10,
activation='softmax'))
=================================================================
conv2d_1 (Conv2D) (None, 26, 26, 32) 320
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32) 0
conv2d_2 (Conv2D) (None, 11, 11, 64) 18496
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64) 0
conv2d_3 (Conv2D) (None, 3, 3, 64) 36928
flatten_1 (Flatten) (None, 576) 0
dense_1 (Dense) (None, 64) 36928
=================================================================
Total params: 93,322 Trainable params: 93,322 Non-trainable params: 0

Padding
• Padding a 5x5 input to extract 25 3x3 patches

Max Pooling
• Downsampling an image
• Better than average pooling and strides

Train a Model to Classify Cats & Dogs
• www.kaggle.com/c/dogs-vs-cats/data
• 2000 cat and 2000 dog images

Create a CNN Model for Binary Classification
from keras import layers from keras
import models model =
models.Sequential()
model.add(layers.Conv2D(32, (3, 3),
activation='relu', input_shape=(150, 150, 3)))
activation='relu'))
activation='relu'))
activation='relu')) model.add(layers.Dense(1,
activation='sigmoid'))

Image Generator
1. Read the picture files.
2. Decode the JPEG
content to RGB grids
of pixels.
3. Convert these into
floating- point
tensors.
4. Rescale the pixel values
(between 0 and 255) to
the [0,
1] interval
from keras.preprocessing.image import
ImageDataGenerator
train_datagen =
ImageDataGenerator(rescale=1./255)
test_datagen =
ImageDataGenerator(rescale=1./255)
train_generator =
train_datagen.flow_from_directory(
train_dir, target_size=(150,
150) batch_size=20,
class_mode='binary')
validation_generator =
test_datagen.flow_from_directory(
validation_dir,
target_size=(150, 150),
batch_size=20,
class_mode='binary')

Python Generator
• Use yield operator
• Note that the generator loops endlessly

Fitting the Model using a Batch
Generator
history = model.fit_generator(
train_generator,
steps_per_epoch=100, epochs=30,
validation_data=validation_generator,
validation_steps=50)
# Save the model
model.save('cats_and_dogs_small_1.h5')

Data Augmentation via ImageDataGenerator
• rotation_range is a value in degrees (0–180)
• width_shift and height_shift are ranges (as a fraction of total width or height) within
which to randomly translate pictures vertically or horizontally.
• shear_range is for randomly applying shearing transformations.
• zoom_range is for randomly zooming inside pictures.
• horizontal_flip is for randomly flipping half the images horizontally
• fill_mode is the strategy used for filling in newly created pixels, which can appear
after a rotation or a width/height shift.
datagen = ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')

Use Pre-trained Models
• Xception
• VGG16
• VGG19
• ResNet, ResNetV2,
ResNeXt
• InceptionV3
• InceptionResNetV2
• MobileNet
• MobileNetV2
• DenseNet
• NASNet

Example: Using Pre-trained
VGG16
• weights specifies the weight checkpoint from which to initialize the model.
• include_top refers to including (or not) the densely connected classifier on
top of the network (1,000 classes output).
• input_shape the network will be able to process inputs of any size it the
argument is omitted.
from keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
include_top=False,
input_shape=(150, 150, 3))

Adding a Classifier on Top of a
Pre-trained Model
from keras import models from
keras import layers
model = models.Sequential() model.add(conv_base)
activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
================================================================
vgg16 (Model) (None, 4, 4, 512) 14714688
flatten_1 (Flatten) (None, 8192) 0
================================================================
Total params: 16,812,353
Trainable params: 16,812,353
Non-trainable params: 0

Freeze Trainable Parameters
• conv_base.trainable = False

Fine-Tuning Top Few Layers
• Freezing all layers up to a specific one
conv_base.trainable = True
set_trainable = False
for layer in conv_base.layers:
if layer.name == 'block5_conv1':
set_trainable = True
if set_trainable:
layer.trainable = True
else:
layer.trainable = False

Summary
• Convnets are the best for Computer Vision (and maybe all
the other tasks)
• Data augmentation is a powerful way to fight overfitting
• We can use pre-trained model for feature extraction
• We can further improve the pre-trained model on our
dataset by fine-tuning

Visualizing What Convnets Learn
1. Visualizing Intermediate ConvNet Outputs (Intermediate Activations)
 Understand how successive convnet layers transform their input
 Get a first idea of the meaning of individual convnet filters
2. Visualizing ConvNets Filters
 Understand precisely what visual pattern or concept each filter in a convnet is
receptive to
3. Visualizing Heatmaps of Class Activation in an Image
 See which parts of an image were identified as belonging to a given class
 Can localize objects in images.

1. Visualizing Intermediate Activations
• Show the feature maps that are output by various
convolution and pooling layers in a network
from keras.preprocessing import image import
numpy as np
img = image.load_img('./test1/1700.jpg', target_size=(150, 150)) img_tensor =
image.img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)/255.
from keras import models
model = load_model('cats_and_dogs_small_1.h5') layer_outputs =
[layer.output for layer in model.layers[:8]]
activation_model = models.Model(inputs=model.input, outputs=layer_outputs) activations =
activation_model.predict(img_tensor)
first_layer_activation = activations[0]
import matplotlib.pyplot as plt plt.matshow(first_layer_activation[0, :, :, 3],
cmap='viridis')

Visualizing Every Channel in
Every Intermediate Activation

Things to Note
• The first layer acts as a collection of various edge detectors
• As you go deeper, the activations become increasingly
abstract and less visually interpretable
• The sparsity of the activations increases with the depth of
the layer, more and more filters are blank

2. Visualizing ConvNet Filters
• Gradient ascent: applying gradient descent to the value of the input
image of a convnet so as to maximize the response of a specific filter
Loss Maximization Via Stochastic Gradient Descent

Convert a Tensor into a Valid
Image

Visualizing ConvNet Filters
model = VGG16(weights='imagenet', include_top=False) layer_name
= 'block3_conv1'
filter_index = 0
def generate_pattern(layer_name, filter_index, size=150): layer_output =
model.get_layer(layer_name).output loss = K.mean(layer_output[:, :, :,
filter_index])
grads = K.gradients(loss, model.input)[0] # Keep only the first tensor
grads /= (K.sqrt(K.mean(K.square(grads))) + 1e-5) # 1e-5 avoids divided by zero # Fetching Numpy output
values given Numpy input values
iterate = K.function([model.input], [loss, grads])
loss_value, grads_value = iterate([np.zeros((1, 150, 150, 3))]) # Loss maximization via
stochastic gradient descent input_img_data = np.random.random((1, size, size, 3)) *
20 + 128. step = 1.
for i in range(40):
loss_value, grads_value = iterate([input_img_data])
input_img_data += grads_value * step img =
input_img_data[0]
return deprocess_image(img)

Filter Patterns for Each Layer

3. Visualizing Heatmaps of Class
Activation
• Ramprasaath R. Selvaraju et al., “Grad-CAM: Visual Explanations from Deep Networks via
Gradient-based Localization.” arXiv (2017), https://arxiv.org/abs/1610.02391.

Convolutional Neural Network
(LeNet-5)
• https://medium.com/@sh.tsang/paper-brief-review-of-lenet-1-lenet-4-lenet-5-
boosted-lenet-4-image-classification-1f5f809dbf17

Error Rate on ImageNet
Challenge

AlexNet (2012)
• AlexNet significantly outperformed previous models (e.g. SVM)
• Include convolutions, max-pooling, dropout, ReLU, SGD with momentum
• Use 2 Nvidia GeForce GTX 580 GPU

ZF Net (2013)
• Parameter tuning of AlexNet

GoogLeNet (2014)
• Achieved a top-5 error rate of 6.67%!
This was very close to human level
performance
• Propose inception module, batch
normalization, image distortions, and
RMSprop
• 22 layers but reduced parameters
from 60 million (AlexNet) to 4 million

VGG Net (2014)
• Very uniform architecture
• Preferred choice in the
community for extracting
features from images

ResNet (2015)
• Residual Neural Network
• Proposed “skip connection”
• 152-layer with 3.57% error rate

References
• Francois Chollet, “Deep Learning with Python,” Chapter 5.
• Adit Deshpande, A Beginner's Guide To Understanding Convolutional
Neural Networks.
• Machine Learning Guru. Understanding Convolutional Layers in
Convolutional Neural Networks (CNNs)
• CNN Architectures: LeNet, AlexNet, VGG, GoogLeNet, ResNet and more
….
• Wikipedia. Convolution
• https://cv-tricks.com/cnn/understand-resnet-alexnet-vgg-inception/
• http://neuralnetworksanddeeplearning.com/
• Stanford’s CS231N

dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf

More Related Content

Similar to dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf (20)

Recently uploaded (20)

dfdshofdifhdifhdfhgfoighfgofgfgfgfgdfdfdfdf