1. TRƯỜNG ĐẠI HỌC BÁCH KHOA – ĐẠI HỌC ĐÀ NẴNG
KHOA CƠ KHÍ GIAO THÔNG
BỘ MÔN KỸ THUẬT Ô TÔ
Giảng viên: TS. Hoàng Thắng
Email: [email protected]
Tel: 070.250.9826
Kiến thức :
Trợ giảng Monica
Trí tuệ nhân tạo ứng dụng
Môn học:
Kỹ năng:
Thành lập các bước đào dạo dữ liệu trong viêc mô hình
mạng nơ tron
Sử dụng phần cứng cho TTNT
Hiểu các kỹ thuật cơ bản của Trí tuệ nhân tạo
Phân loại được các thuật toán học máy và các ứng dụng cơ
bản của TTNT trong lĩnh vực kỹ thuật cơ khí động lực
Giải thích được giản đồ mạng nơ tron trong Deep Learning;
4. How do our brains work?
The Brain is A massively parallel information processing system.
Our brains are a huge network of processing elements. A typical brain contains a
network of 10 billion neurons.
5. How do our brains work?
A processing element
Dendrites: Input
Cell body: Processor
Synaptic: Link
Axon: Output
6. How do ANNs work?
An artificial neuron is an imitation of a human neuron
7. How do ANNs work?
• Now, let us have a look at the model of an artificial neuron.
8. How do ANNs work?
Output
x1
x2
xm
∑
y
Processing
Input
∑= X1+X2 + ….+Xm =y
. . . . . . . . . . .
.
9. How do ANNs work?
Not all inputs are equal
Output
x1
x2
xm
∑
y
Processing
Input
∑= X1w1+X2w2 + ….+Xmwm
=y
w1
w2
wm
weights
. . . . . . . . . . .
.
. . . .
.
10. How do ANNs work?
The signal is not passed down to the
next neuron verbatim
Transfer Function
(Activation Function)
Output
x1
x2
xm
∑
y
Processing
Input
w1
w2
wm
weights
. . . . . . . . . . .
.
f(vk)
. . . .
.
11. The output is a function of the input, that is
affected by the weights, and the transfer
functions
13. Convolutional neural networks
were inspired by the layered
architecture of the human
visual cortex, and below are
some key similarities and
differences:
The importance of CNNs
14. Key Components of a CNN
The convolutional neural network is made of four main parts.
But how do CNNs Learn with those parts?
They help the CNNs mimic how the human brain operates to
recognize patterns and features in images:
•Convolutional layers
•Rectified Linear Unit (ReLU for short)
•Pooling layers
•Fully connected layers
15. This section dives into the definition of each one of these components through the example
of the following example of classification of a handwritten digit.
16. Digital Images
• Input array: an image’s height × width × 3 (RGB)
• Value of each pixel: 0 - 255
19. Convolution Theorem
• Fourier transform of a convolution of two signals is the
pointwise product of their Fourier transforms
• Convolution is usually introduced with its formal definition:
29. Activation function
A ReLU activation function is applied after each convolution
operation. This function helps the network learn non-linear
relationships between the features in the image, hence making
the network more robust for identifying different patterns. It also
helps to mitigate the vanishing gradient problems.
31. Pooling layer
The goal of the pooling layer is to pull the most significant features from the
convoluted matrix. This is done by applying some aggregation operations, which
reduce the dimension of the feature map (convoluted matrix), hence reducing the
memory used while training the network. Pooling is also relevant for mitigating
overfitting.
32. Create First ConvNet
• Create a CNN to classify MNIST digits
from keras import layers from
keras import models
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
34. Feature Map
• Outputs of a Convolution Layer is also called as Feature Map
=>layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))
Receive a 28x28 input image and computes 32 filters over it
Each filter has size 3x3
45. Fitting the Model using a Batch
Generator
history = model.fit_generator(
train_generator,
steps_per_epoch=100, epochs=30,
validation_data=validation_generator,
validation_steps=50)
# Save the model
model.save('cats_and_dogs_small_1.h5')
47. Data Augmentation via ImageDataGenerator
• rotation_range is a value in degrees (0–180)
• width_shift and height_shift are ranges (as a fraction of total width or height) within
which to randomly translate pictures vertically or horizontally.
• shear_range is for randomly applying shearing transformations.
• zoom_range is for randomly zooming inside pictures.
• horizontal_flip is for randomly flipping half the images horizontally
• fill_mode is the strategy used for filling in newly created pixels, which can appear
after a rotation or a width/height shift.
datagen = ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
49. Example: Using Pre-trained
VGG16
• weights specifies the weight checkpoint from which to initialize the model.
• include_top refers to including (or not) the densely connected classifier on
top of the network (1,000 classes output).
• input_shape the network will be able to process inputs of any size it the
argument is omitted.
from keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
include_top=False,
input_shape=(150, 150, 3))
50. Adding a Classifier on Top of a
Pre-trained Model
from keras import models from
keras import layers
model = models.Sequential() model.add(conv_base)
model.add(layers.Flatten()) model.add(layers.Dense(256,
activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
Layer (type) Output Shape Param #
================================================================
vgg16 (Model) (None, 4, 4, 512) 14714688
flatten_1 (Flatten) (None, 8192) 0
dense_1 (Dense) (None, 256) 2097408
dense_2 (Dense) (None, 1) 257
================================================================
Total params: 16,812,353
Trainable params: 16,812,353
Non-trainable params: 0
52. Fine-Tuning Top Few Layers
• Freezing all layers up to a specific one
conv_base.trainable = True
set_trainable = False
for layer in conv_base.layers:
if layer.name == 'block5_conv1':
set_trainable = True
if set_trainable:
layer.trainable = True
else:
layer.trainable = False
53. Summary
• Convnets are the best for Computer Vision (and maybe all
the other tasks)
• Data augmentation is a powerful way to fight overfitting
• We can use pre-trained model for feature extraction
• We can further improve the pre-trained model on our
dataset by fine-tuning
54. Visualizing What Convnets Learn
1. Visualizing Intermediate ConvNet Outputs (Intermediate Activations)
Understand how successive convnet layers transform their input
Get a first idea of the meaning of individual convnet filters
2. Visualizing ConvNets Filters
Understand precisely what visual pattern or concept each filter in a convnet is
receptive to
3. Visualizing Heatmaps of Class Activation in an Image
See which parts of an image were identified as belonging to a given class
Can localize objects in images.
55. 1. Visualizing Intermediate Activations
• Show the feature maps that are output by various
convolution and pooling layers in a network
from keras.preprocessing import image import
numpy as np
img = image.load_img('./test1/1700.jpg', target_size=(150, 150)) img_tensor =
image.img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)/255.
from keras import models
model = load_model('cats_and_dogs_small_1.h5') layer_outputs =
[layer.output for layer in model.layers[:8]]
activation_model = models.Model(inputs=model.input, outputs=layer_outputs) activations =
activation_model.predict(img_tensor)
first_layer_activation = activations[0]
import matplotlib.pyplot as plt plt.matshow(first_layer_activation[0, :, :, 3],
cmap='viridis')
58. Things to Note
• The first layer acts as a collection of various edge detectors
• As you go deeper, the activations become increasingly
abstract and less visually interpretable
• The sparsity of the activations increases with the depth of
the layer, more and more filters are blank
59. 2. Visualizing ConvNet Filters
• Gradient ascent: applying gradient descent to the value of the input
image of a convnet so as to maximize the response of a specific filter
Loss Maximization Via Stochastic Gradient Descent
63. 3. Visualizing Heatmaps of Class
Activation
• Ramprasaath R. Selvaraju et al., “Grad-CAM: Visual Explanations from Deep Networks via
Gradient-based Localization.” arXiv (2017), https://arxiv.org/abs/1610.02391.
71. GoogLeNet (2014)
• Achieved a top-5 error rate of 6.67%!
This was very close to human level
performance
• Propose inception module, batch
normalization, image distortions, and
RMSprop
• 22 layers but reduced parameters
from 60 million (AlexNet) to 4 million