SlideShare a Scribd company logo
Intro to Neural
NetworksEng. Abdallah Bashir
Session topics:
1. Introduction to Neural Networks.
2. Neural Networks Basics.
3. Shallow neural networks.
4. Deep Neural Networks.
1. Introduction to Neural Networks
Introduction to Neural Netwoks
1.1 What is a neuron ?
The input is the size
of the house (x)
The output is the
price (y)
• It is a linear regression problem because the price as a function
of size is a continuous output.
• We know the prices can never be negative so we are creating a
function called Rectified Linear Unit (ReLU) which starts at zero.
• Single neuron = linear regression
1.2 Neural Network Architecture
• The price of a house can be affected by
other features such as size, number of
bedrooms, zip code and wealth.
• The role of the neural network is to
predicted the price and it will automatically
generate the hidden units. We only need to
give the inputs x and the output y.
Input layer
hidden layers
output layer
Each Input will be connected to the
hidden layer and the NN will decide
the connections.
Supervised learning means we have
the (X,Y) and we need to get the
function that maps X to Y.
1.3 SUPERVISED LEARNING WITH NEURAL
NETWORKS
Different types of neural networks for supervised learning which includes:
• Standard NN (Useful for Structured data)
• CNN or convolutional neural networks (Useful in computer vision)
• RNN or Recurrent neural networks (Useful in Speech recognition or NLP)
• Hybrid/custom NN or a Collection of NNs types
Introduction to Neural Netwoks
Introduction to Neural Netwoks
1.4 Structured vs Unstructured Data
• Structured data is like the databases and tables.
• Unstructured data is like images, video, audio, and text.
1.5 Why is deep learning taking off?
Deep learning is taking off for 3 reasons:
1. Data
•For small data NN can perform as Linear regression
or SVM (Support vector machine)
•For big data a small NN is better that SVM
•For big data a big NN is better that a medium NN is
better that small NN.
2. Computation:
•GPUs.
•Powerful CPUs.
•Distributed computing.
3.Algorithm:
Creative algorithms has
appeared that changed
the way NN works.
2. Neural Networks Basics
2.1 Binary Classification
In a binary classification problem, the
result is a discrete value output.
For example:
• account hacked (1) or compromised (0)
•Object is a cat (1) or no cat (0)
Example: Cat vs Non-Cat
The goal is to train a classifier that the input is an image
represented by a feature vector, X, and predicts whether the
corresponding label Y, is 1 or 0. In this case, whether this is a cat
image (1) or a non-cat image (0).
The value in a cell represents the pixel
intensity which will be used to create a
feature vector of n dimension. In pattern
recognition and machine learning, a
feature vector represents an object, in
this case, a cat or no cat.
To create a feature vector, x, the pixel
intensity values will be “unroll” or
“reshape” for each color. The dimension
of the input feature vector x is Nx = 64 x
64 x 3 = 12 288
2.1.1 Neural Networks Notations
Here are some of the notations:
• M is the number of examples in the datasets.
• Nx is the size of the input vector
• Ny is the size of the output vector
• X(1) is the first input vector
• Y(1) is the first output vector
• X = [x(1) x(2).. x(M)]
• Y = (y(1) y(2).. y(M))
• L is the number of layers.
2.2 Logistic Regression
Logistic regression is a learning algorithm used in a supervised learning
problem when the output y are all either zero or one. The goal of
logistic regression is to minimize the error between its predictions and
training data.
Example: Cat vs No - cat
Given an image represented by a feature vector x , the algorithm will
evaluate the probability of a cat being in that image.
The parameters used in Logistic regression are:
Introduction to Neural Netwoks
2.2.1 Cost Function
To train the parameters w and b we need to define a cost function
Loss Function:
The loss function measures the discrepancy between the prediction
and the desired output
To explain the last function lets see:
• if y= 1 ==> L(y',1) = -log(y’)
• if y = 0 ==> L(y',0) = -log(1-y') ==>
• Then the Cost function will be:
• The loss function computes the error for a single training
example
• the cost function is the average of the loss functions of the
entire training set.
2.2.2 Gradient descent
2.2.2 Gradient descent
• Goal is to find 𝑤, 𝑏 that minimize the cost function 𝐽 𝑤, 𝑏
• First we initialize w and b to 0,0 or initialize them to a
random value in the cost function and then try to improve
the values
• In Logistic regression people always use 0,0 instead of
random.
•The gradient decent algorithm repeats:
• w = w - alpha * dw where alpha is the
learning rate and dw is the derivative of w
(Change to w) The derivative is also the
slope of w.
• w = w - alpha * d(J(w,b) / dw) (how much
the function slopes in the w direction)
• b = b - alpha * d(J(w,b) / db) (how much
the function slopes in the d direction)
Gradient Descent
𝑤
Introduction to Neural Netwoks
Computing derivatives
𝑢= 𝑏𝑐
𝑣 = 𝑎 + 𝑢 𝐽 = 3𝑣6
11 33
𝑎 = 5
𝑐 = 2
𝑏 = 3
2.2.3 Vectorizing Logistic Regression
• As an input we have a matrix X and its [Nx, m] and a matrix Y and its
[Ny, m].
• We will then compute at instance [z1,z2...zm] = W' * X + [b,b,...b].
This can be written in python as:
Z = np.dot(W.T,X) + b #Z shape is (1, m)
A = 1 / 1 + np.exp(-Z) # A shape is (1, m)
Vectorizing Logistic Regression's Gradient Output:
• dz = A - Y # dz shape is (1, m)
• dw = np.dot(X, dz.T) / m #dw shape is (Nx, 1)
• db = dz.sum() / m # db shape is (1, 1)
Side Notes
The main steps for building a Neural Network
are:
•Define the model structure (such as number of
input features and outputs)
•Initialize the model's parameters.
•Loop.
• Calculate current loss (forward propagation)
• Calculate current gradient (backward propagation)
• Update parameters (gradient descent)
Side Notes
•Preprocessing the dataset is important.
Tuning the learning rate (which is an example of a "hyperparameter")
can make a big difference to the algorithm.
kaggle.com is a good place for datasets and competitions.
3. Shallow
Neural Networks
3.1 Neural Networks Overview
• In logistic regression we had:
𝑥1
𝑥2
𝑥3
𝑦
x
w
b
𝑧 = 𝑤 𝑇
𝑥 + 𝑏 𝑎 = 𝜎(𝑧) ℒ(𝑎, 𝑦)
• In neural networks with one layer we will have:
𝑥1
𝑥2
𝑥3
𝑦
𝑧[1] = 𝑊[1] 𝑥 + 𝑏[1]
x
𝑊[1]
𝑏[1]
𝑎[1]
= 𝜎(𝑧[1]
) 𝑧[2]
= 𝑊[2]
𝑎[1]
+ 𝑏[2] 𝑎[2]
= 𝜎(𝑧[2]
) ℒ(𝑎[2]
, 𝑦)
3.2 Shallow Neural Network Representation
• We will define the neural networks that has one hidden layer.
• NN contains of input layers, hidden layers, output layers.
• Hidden layer means we cant see that layers in the training set.
• a0 = x (the input layer)
• a1 will represent the activation of the hidden neurons.
• a2 will represent the output layer.
• We are talking about 2 layers NN. The input layer isn't counted.
𝑥1
𝑥2
𝑥3
𝑦
3.3 Forward Propagation
3.3 Forward Propagation
𝑥1
𝑥2
𝑥3
𝑦
…𝑋 = 𝑥(1)
𝑥(2) 𝑥(𝑚)
𝑎[1](2)A[1]
= 𝑎[1](1) 𝑎[1](𝑚)…
𝑍 1 = 𝑊 1 𝑋 + 𝑏 1
𝐴 1 = 𝜎(𝑍 1 )
𝑍 2
= 𝑊 2
𝐴 1
+ 𝑏 2
𝐴 2 = 𝜎(𝑍 2 )
Here are some information about the last image:
1) Nh= 4
2) Nx = 3
3) Shapes of the variables:
I. W1 is the matrix of the first hidden layer, it has a shape of
(noOfHiddenNeurons,nx)
II. b1 is the matrix of the first hidden layer, it has a shape of
(noOfHiddenNeurons,1)
III. z1 is the result of the equation z1 = W1*X + b, it has a shape of
(noOfHiddenNeurons,1)
IV. a1 is the result of the equation a1 = sigmoid(z1), it has a shape of
(noOfHiddenNeurons,1)
V. W2 is the matrix of the second layer, it has a shape of (1,noOfHiddenLayers)
VI. b2 is the matrix of the second layer, it has a shape of (1,1)
VII. z2 is the result of the equation z2 = W2*a1 + b, it has a shape of (1,1)
VIII. a2 is the result of the equation a2 = sigmoid(z2), it has a shape of (1,1)
•Pseudo code for forward propagation for the 2
layers NN, Lets say we have X on shape (Nx,m):
Z1 = W1X + b1 # shape of Z1 (noOfHiddenNeurons,m)
A1 = sigmoid(Z1) # shape of A1 (noOfHiddenNeurons,m)
Z2 = W2A1 + b2 # shape of Z2 is (1,m)
A2 = sigmoid(Z2) # shape of A2 is (1,m)
3.4 Activation Functions
• In computational networks, the activation function of a node defines
the output of that node given an input or set of inputs. A standard
computer chip circuit can be seen as a digital network of activation
functions that can be "ON" (1) or "OFF" (0)
• So far we are using sigmoid, but in some cases other functions can be
a lot better.
• Sigmoid can lead us to gradient decent problem where the updates
are so low.
• Sigmoid activation function range is [0,1] A = 1 / (1 + np.exp(-z)) #
Where z is the input matrix
• Tanh activation function range is [-1,1] (Shifted version of sigmoid
function)
• It turns out that the tanh activation usually works better than
sigmoid activation function for hidden units.
• Sigmoid or Tanh function disadvantage is that if the input is too
small or too high, the slope will be near zero which will cause
us the gradient decent problem.
• One of the popular activation functions that solved the slow
gradient decent is the RELU function. RELU = max(0,z) # so if z
is negative the slope is 0 and if z is positive the slope remains
linear.
• So here is some basic rule for choosing activation functions, if
your classification is between 0 and 1, use the output
activation as sigmoid and the others as RELU
Introduction to Neural Netwoks
Side Notes
• In NN you will decide a lot of choices like:
• No of hidden layers.
• No of neurons in each hidden layer.
• Learning rate. (The most important parameter)
• Activation functions.
• And others..
3.4 Backpropagation
•This is when all the magic happens !!
3.4 Backward Propagation
NN parameters:
o n[0] = Nx
o n[1] = NoOfHiddenNeurons
o n[2] = NoOfOutputNeurons = 1
o W1 shape is (n[1],n[0])
o b1 shape is (n[1],1)
o W2 shape is (n[2],n[1])
o b2 shape is (n[2],1)
Then Gradient descent:
Repeat:
Compute predictions (y'[i], i = 0,...m)
Get derivatives: dW1, db1, dW2, db2
Update: W1 = W1 - LearningRate * dW1
b1 = b1 - LearningRate * db1
W2 = W2 - LearningRate * dW2
b2 = b2 - LearningRate * db2
Forward propagation:
oZ1 = W1A0 + b1 # A0 is X
oA1 = g1(Z1)
oZ2 = W2A1 + b2
oA2 = Sigmoid(Z2) # Sigmoid because the output is between 0 and 1
𝑥1
𝑥2
𝑥3
𝑦
Back propagation :
odZ2 = A2 - Y
odW2 = (dZ2 * A1.T) / m
odb2 = Sum(dZ2) / m
odZ1 = (W2.T * dZ2) * g'1(Z1) # element wise product (*)
odW2 = (dZ1 * A0.T) / m # A0 = X
odb2 = Sum(dZ1) / m
𝑥1
𝑥2
𝑥3
𝑦
Introduction to Neural Netwoks
3.5 Random Initialization
• In logistic regression it wasn't important to initialize the
weights randomly, while in NN we have to initialize them
randomly.
• If we initialize all the weights with zeros in NN it won't
work (initializing bias with zero is OK):
• All hidden units will be completely identical (symmetric) -
compute exactly the same function.
• On each gradient descent iteration all the hidden units will
always update the same.
• To solve this we initialize the W's with a small random numbers:
• W1 = np.random.randn((2,2)) * 0.01 # 0.01 to make it small enough
• b1 = np.zeros((2,1)) # its ok to have b as zero, it won't get us
to the symmetry breaking
𝑎1
[1]
𝑥1
𝑎2
[1]
𝑥2
𝑦𝑎1
[2]
Introduction to Neural Netwoks
4. Deep Neural
Networks
Introduction to Neural Netwoks
4.1 Deep L-layer neural network
•Shallow NN is a NN with one or two layers.
•Deep NN is a NN with three or more layers.
•We will use the notation L to denote the number
of layers in a NN.
•n[l] is the number of neurons in a specific layer l.
•n[0] denotes the number of neurons input layer.
n[L] denotes the number of neurons in output
layer.
•g[l] is the activation function.
4.2 Forward Propagation in a Deep Network
Forward propagation general rule for m inputs:
•Z[l] = W[l]A[l-1] + B[l]
•A[l] = g[l](A[l])
4.2.1 Matrix Dimensions
•Dimension of W is (n[l],n[l-1]) . Can be thought by
right to left.
•Dimension of b is (n[l],1)
•dw has the same shape as W, while db is the
same shape as b
•Dimension of Z[l], A[l], dZ[l], and dA[l] is (n[l],m)
4.3 Intuition about deep representation
𝑦
4.4 Parameters vs Hyperparameters
• Main parameters of the NN is W and b
• Hyper parameters (parameters that control the algorithm) are like:
• Learning rate.
• Number of iteration.
• Number of hidden layers L.
• Number of hidden units n.
• Choice of activation functions.
• You have to try values yourself of hyper parameters.
Introduction to Neural Netwoks
4.5 NN and The Human Brain !
•The analogy that "It is like the brain" has become really
an oversimplified explanation.
•There is a very simplistic analogy between a single
logistic unit and a single neuron in the brain.
•No human today understand how a human brain
neuron works.
•No human today know exactly how many neurons on
the brain.

More Related Content

What's hot (20)

PPTX
Dynamic programming Basics
Kvishnu Dahatonde
 
PDF
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
PDF
Principal Component Analysis for Tensor Analysis and EEG classification
Tatsuya Yokota
 
PDF
deep learning
Aravindharamanan S
 
PPTX
Linked CP Tensor Decomposition (presented by ICONIP2012)
Tatsuya Yokota
 
PDF
06 Arithmetic 1
anithabalaprabhu
 
PDF
Tensor representations in signal processing and machine learning (tutorial ta...
Tatsuya Yokota
 
PDF
Ch 04 Arithmetic Coding (Ppt)
anithabalaprabhu
 
PPTX
Neural network
Mahmoud Hussein
 
PDF
Lec3
Amba Research
 
PDF
505 260-266
idescitation
 
PPTX
Rabbit challenge 3 DNN Day1
TOMMYLINK1
 
PPTX
Rabbit challenge 3 DNN Day2
TOMMYLINK1
 
PDF
Digital signal and image processing FAQ
Mukesh Tekwani
 
PDF
오토인코더의 모든 것
NAVER Engineering
 
DOC
Unit 3 daa
Nv Thejaswini
 
ODP
Svm V SVC
AMR koura
 
PDF
Lesson 1: Functions and their representations (slides)
Matthew Leingang
 
PDF
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Universitat Politècnica de Catalunya
 
PDF
Fast dct algorithm using winograd’s method
IAEME Publication
 
Dynamic programming Basics
Kvishnu Dahatonde
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
Principal Component Analysis for Tensor Analysis and EEG classification
Tatsuya Yokota
 
deep learning
Aravindharamanan S
 
Linked CP Tensor Decomposition (presented by ICONIP2012)
Tatsuya Yokota
 
06 Arithmetic 1
anithabalaprabhu
 
Tensor representations in signal processing and machine learning (tutorial ta...
Tatsuya Yokota
 
Ch 04 Arithmetic Coding (Ppt)
anithabalaprabhu
 
Neural network
Mahmoud Hussein
 
505 260-266
idescitation
 
Rabbit challenge 3 DNN Day1
TOMMYLINK1
 
Rabbit challenge 3 DNN Day2
TOMMYLINK1
 
Digital signal and image processing FAQ
Mukesh Tekwani
 
오토인코더의 모든 것
NAVER Engineering
 
Unit 3 daa
Nv Thejaswini
 
Svm V SVC
AMR koura
 
Lesson 1: Functions and their representations (slides)
Matthew Leingang
 
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Universitat Politècnica de Catalunya
 
Fast dct algorithm using winograd’s method
IAEME Publication
 

Similar to Introduction to Neural Netwoks (20)

PDF
Neural Nets Deconstructed
Paul Sterk
 
PPTX
Introduction to Neural Networks and Deep Learning from Scratch
Ahmed BESBES
 
PPT
Artificial Neural Network
Pratik Aggarwal
 
PDF
Capstone paper
Muhammad Saeed
 
PDF
Neural Networks. Overview
Oleksandr Baiev
 
PPTX
Lec10.pptx
AbrahamTadesse11
 
PPTX
Neural network basic and introduction of Deep learning
Tapas Majumdar
 
PPTX
Introduction to Deep Learning and Tensorflow
Oswald Campesato
 
PPTX
Neural Networks - How do they work?
Accubits Technologies
 
PPTX
Learning to Rank with Neural Networks
Bhaskar Mitra
 
PPTX
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Oswald Campesato
 
PDF
CSSC ML Workshop
GDSC UofT Mississauga
 
PPTX
Lecture02_Updated_Shallow Neural Networks.pptx
UzairAli65885
 
PPTX
Java and Deep Learning
Oswald Campesato
 
PDF
Cs229 notes-deep learning
VuTran231
 
PPTX
Artificial Neural Network
Dessy Amirudin
 
PPTX
Introduction to Machine Learning basics.pptx
srimathihss
 
PDF
M7 - Neural Networks in machine learning.pdf
ArushiKansal3
 
PPTX
Deep Learning and TensorFlow
Oswald Campesato
 
PPTX
Scala and Deep Learning
Oswald Campesato
 
Neural Nets Deconstructed
Paul Sterk
 
Introduction to Neural Networks and Deep Learning from Scratch
Ahmed BESBES
 
Artificial Neural Network
Pratik Aggarwal
 
Capstone paper
Muhammad Saeed
 
Neural Networks. Overview
Oleksandr Baiev
 
Lec10.pptx
AbrahamTadesse11
 
Neural network basic and introduction of Deep learning
Tapas Majumdar
 
Introduction to Deep Learning and Tensorflow
Oswald Campesato
 
Neural Networks - How do they work?
Accubits Technologies
 
Learning to Rank with Neural Networks
Bhaskar Mitra
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Oswald Campesato
 
CSSC ML Workshop
GDSC UofT Mississauga
 
Lecture02_Updated_Shallow Neural Networks.pptx
UzairAli65885
 
Java and Deep Learning
Oswald Campesato
 
Cs229 notes-deep learning
VuTran231
 
Artificial Neural Network
Dessy Amirudin
 
Introduction to Machine Learning basics.pptx
srimathihss
 
M7 - Neural Networks in machine learning.pdf
ArushiKansal3
 
Deep Learning and TensorFlow
Oswald Campesato
 
Scala and Deep Learning
Oswald Campesato
 
Ad

Recently uploaded (20)

PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PDF
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna36
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna36
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
Ad

Introduction to Neural Netwoks

  • 2. Session topics: 1. Introduction to Neural Networks. 2. Neural Networks Basics. 3. Shallow neural networks. 4. Deep Neural Networks.
  • 3. 1. Introduction to Neural Networks
  • 5. 1.1 What is a neuron ? The input is the size of the house (x) The output is the price (y)
  • 6. • It is a linear regression problem because the price as a function of size is a continuous output. • We know the prices can never be negative so we are creating a function called Rectified Linear Unit (ReLU) which starts at zero. • Single neuron = linear regression
  • 7. 1.2 Neural Network Architecture • The price of a house can be affected by other features such as size, number of bedrooms, zip code and wealth. • The role of the neural network is to predicted the price and it will automatically generate the hidden units. We only need to give the inputs x and the output y.
  • 9. Each Input will be connected to the hidden layer and the NN will decide the connections. Supervised learning means we have the (X,Y) and we need to get the function that maps X to Y.
  • 10. 1.3 SUPERVISED LEARNING WITH NEURAL NETWORKS Different types of neural networks for supervised learning which includes: • Standard NN (Useful for Structured data) • CNN or convolutional neural networks (Useful in computer vision) • RNN or Recurrent neural networks (Useful in Speech recognition or NLP) • Hybrid/custom NN or a Collection of NNs types
  • 13. 1.4 Structured vs Unstructured Data • Structured data is like the databases and tables. • Unstructured data is like images, video, audio, and text.
  • 14. 1.5 Why is deep learning taking off? Deep learning is taking off for 3 reasons: 1. Data
  • 15. •For small data NN can perform as Linear regression or SVM (Support vector machine) •For big data a small NN is better that SVM •For big data a big NN is better that a medium NN is better that small NN.
  • 16. 2. Computation: •GPUs. •Powerful CPUs. •Distributed computing. 3.Algorithm: Creative algorithms has appeared that changed the way NN works.
  • 18. 2.1 Binary Classification In a binary classification problem, the result is a discrete value output. For example: • account hacked (1) or compromised (0) •Object is a cat (1) or no cat (0)
  • 19. Example: Cat vs Non-Cat The goal is to train a classifier that the input is an image represented by a feature vector, X, and predicts whether the corresponding label Y, is 1 or 0. In this case, whether this is a cat image (1) or a non-cat image (0).
  • 20. The value in a cell represents the pixel intensity which will be used to create a feature vector of n dimension. In pattern recognition and machine learning, a feature vector represents an object, in this case, a cat or no cat. To create a feature vector, x, the pixel intensity values will be “unroll” or “reshape” for each color. The dimension of the input feature vector x is Nx = 64 x 64 x 3 = 12 288
  • 21. 2.1.1 Neural Networks Notations Here are some of the notations: • M is the number of examples in the datasets. • Nx is the size of the input vector • Ny is the size of the output vector • X(1) is the first input vector • Y(1) is the first output vector • X = [x(1) x(2).. x(M)] • Y = (y(1) y(2).. y(M)) • L is the number of layers.
  • 22. 2.2 Logistic Regression Logistic regression is a learning algorithm used in a supervised learning problem when the output y are all either zero or one. The goal of logistic regression is to minimize the error between its predictions and training data. Example: Cat vs No - cat Given an image represented by a feature vector x , the algorithm will evaluate the probability of a cat being in that image.
  • 23. The parameters used in Logistic regression are:
  • 25. 2.2.1 Cost Function To train the parameters w and b we need to define a cost function Loss Function: The loss function measures the discrepancy between the prediction and the desired output
  • 26. To explain the last function lets see: • if y= 1 ==> L(y',1) = -log(y’) • if y = 0 ==> L(y',0) = -log(1-y') ==>
  • 27. • Then the Cost function will be: • The loss function computes the error for a single training example • the cost function is the average of the loss functions of the entire training set.
  • 29. 2.2.2 Gradient descent • Goal is to find 𝑤, 𝑏 that minimize the cost function 𝐽 𝑤, 𝑏 • First we initialize w and b to 0,0 or initialize them to a random value in the cost function and then try to improve the values • In Logistic regression people always use 0,0 instead of random.
  • 30. •The gradient decent algorithm repeats: • w = w - alpha * dw where alpha is the learning rate and dw is the derivative of w (Change to w) The derivative is also the slope of w. • w = w - alpha * d(J(w,b) / dw) (how much the function slopes in the w direction) • b = b - alpha * d(J(w,b) / db) (how much the function slopes in the d direction)
  • 33. Computing derivatives 𝑢= 𝑏𝑐 𝑣 = 𝑎 + 𝑢 𝐽 = 3𝑣6 11 33 𝑎 = 5 𝑐 = 2 𝑏 = 3
  • 34. 2.2.3 Vectorizing Logistic Regression • As an input we have a matrix X and its [Nx, m] and a matrix Y and its [Ny, m]. • We will then compute at instance [z1,z2...zm] = W' * X + [b,b,...b]. This can be written in python as: Z = np.dot(W.T,X) + b #Z shape is (1, m) A = 1 / 1 + np.exp(-Z) # A shape is (1, m)
  • 35. Vectorizing Logistic Regression's Gradient Output: • dz = A - Y # dz shape is (1, m) • dw = np.dot(X, dz.T) / m #dw shape is (Nx, 1) • db = dz.sum() / m # db shape is (1, 1)
  • 36. Side Notes The main steps for building a Neural Network are: •Define the model structure (such as number of input features and outputs) •Initialize the model's parameters. •Loop. • Calculate current loss (forward propagation) • Calculate current gradient (backward propagation) • Update parameters (gradient descent)
  • 37. Side Notes •Preprocessing the dataset is important. Tuning the learning rate (which is an example of a "hyperparameter") can make a big difference to the algorithm. kaggle.com is a good place for datasets and competitions.
  • 39. 3.1 Neural Networks Overview • In logistic regression we had: 𝑥1 𝑥2 𝑥3 𝑦 x w b 𝑧 = 𝑤 𝑇 𝑥 + 𝑏 𝑎 = 𝜎(𝑧) ℒ(𝑎, 𝑦)
  • 40. • In neural networks with one layer we will have: 𝑥1 𝑥2 𝑥3 𝑦 𝑧[1] = 𝑊[1] 𝑥 + 𝑏[1] x 𝑊[1] 𝑏[1] 𝑎[1] = 𝜎(𝑧[1] ) 𝑧[2] = 𝑊[2] 𝑎[1] + 𝑏[2] 𝑎[2] = 𝜎(𝑧[2] ) ℒ(𝑎[2] , 𝑦)
  • 41. 3.2 Shallow Neural Network Representation • We will define the neural networks that has one hidden layer. • NN contains of input layers, hidden layers, output layers. • Hidden layer means we cant see that layers in the training set. • a0 = x (the input layer) • a1 will represent the activation of the hidden neurons. • a2 will represent the output layer. • We are talking about 2 layers NN. The input layer isn't counted. 𝑥1 𝑥2 𝑥3 𝑦
  • 43. 3.3 Forward Propagation 𝑥1 𝑥2 𝑥3 𝑦 …𝑋 = 𝑥(1) 𝑥(2) 𝑥(𝑚) 𝑎[1](2)A[1] = 𝑎[1](1) 𝑎[1](𝑚)… 𝑍 1 = 𝑊 1 𝑋 + 𝑏 1 𝐴 1 = 𝜎(𝑍 1 ) 𝑍 2 = 𝑊 2 𝐴 1 + 𝑏 2 𝐴 2 = 𝜎(𝑍 2 )
  • 44. Here are some information about the last image: 1) Nh= 4 2) Nx = 3 3) Shapes of the variables: I. W1 is the matrix of the first hidden layer, it has a shape of (noOfHiddenNeurons,nx) II. b1 is the matrix of the first hidden layer, it has a shape of (noOfHiddenNeurons,1) III. z1 is the result of the equation z1 = W1*X + b, it has a shape of (noOfHiddenNeurons,1) IV. a1 is the result of the equation a1 = sigmoid(z1), it has a shape of (noOfHiddenNeurons,1) V. W2 is the matrix of the second layer, it has a shape of (1,noOfHiddenLayers) VI. b2 is the matrix of the second layer, it has a shape of (1,1) VII. z2 is the result of the equation z2 = W2*a1 + b, it has a shape of (1,1) VIII. a2 is the result of the equation a2 = sigmoid(z2), it has a shape of (1,1)
  • 45. •Pseudo code for forward propagation for the 2 layers NN, Lets say we have X on shape (Nx,m): Z1 = W1X + b1 # shape of Z1 (noOfHiddenNeurons,m) A1 = sigmoid(Z1) # shape of A1 (noOfHiddenNeurons,m) Z2 = W2A1 + b2 # shape of Z2 is (1,m) A2 = sigmoid(Z2) # shape of A2 is (1,m)
  • 46. 3.4 Activation Functions • In computational networks, the activation function of a node defines the output of that node given an input or set of inputs. A standard computer chip circuit can be seen as a digital network of activation functions that can be "ON" (1) or "OFF" (0) • So far we are using sigmoid, but in some cases other functions can be a lot better. • Sigmoid can lead us to gradient decent problem where the updates are so low. • Sigmoid activation function range is [0,1] A = 1 / (1 + np.exp(-z)) # Where z is the input matrix • Tanh activation function range is [-1,1] (Shifted version of sigmoid function)
  • 47. • It turns out that the tanh activation usually works better than sigmoid activation function for hidden units. • Sigmoid or Tanh function disadvantage is that if the input is too small or too high, the slope will be near zero which will cause us the gradient decent problem. • One of the popular activation functions that solved the slow gradient decent is the RELU function. RELU = max(0,z) # so if z is negative the slope is 0 and if z is positive the slope remains linear. • So here is some basic rule for choosing activation functions, if your classification is between 0 and 1, use the output activation as sigmoid and the others as RELU
  • 49. Side Notes • In NN you will decide a lot of choices like: • No of hidden layers. • No of neurons in each hidden layer. • Learning rate. (The most important parameter) • Activation functions. • And others..
  • 50. 3.4 Backpropagation •This is when all the magic happens !!
  • 51. 3.4 Backward Propagation NN parameters: o n[0] = Nx o n[1] = NoOfHiddenNeurons o n[2] = NoOfOutputNeurons = 1 o W1 shape is (n[1],n[0]) o b1 shape is (n[1],1) o W2 shape is (n[2],n[1]) o b2 shape is (n[2],1)
  • 52. Then Gradient descent: Repeat: Compute predictions (y'[i], i = 0,...m) Get derivatives: dW1, db1, dW2, db2 Update: W1 = W1 - LearningRate * dW1 b1 = b1 - LearningRate * db1 W2 = W2 - LearningRate * dW2 b2 = b2 - LearningRate * db2
  • 53. Forward propagation: oZ1 = W1A0 + b1 # A0 is X oA1 = g1(Z1) oZ2 = W2A1 + b2 oA2 = Sigmoid(Z2) # Sigmoid because the output is between 0 and 1 𝑥1 𝑥2 𝑥3 𝑦
  • 54. Back propagation : odZ2 = A2 - Y odW2 = (dZ2 * A1.T) / m odb2 = Sum(dZ2) / m odZ1 = (W2.T * dZ2) * g'1(Z1) # element wise product (*) odW2 = (dZ1 * A0.T) / m # A0 = X odb2 = Sum(dZ1) / m 𝑥1 𝑥2 𝑥3 𝑦
  • 56. 3.5 Random Initialization • In logistic regression it wasn't important to initialize the weights randomly, while in NN we have to initialize them randomly. • If we initialize all the weights with zeros in NN it won't work (initializing bias with zero is OK): • All hidden units will be completely identical (symmetric) - compute exactly the same function. • On each gradient descent iteration all the hidden units will always update the same.
  • 57. • To solve this we initialize the W's with a small random numbers: • W1 = np.random.randn((2,2)) * 0.01 # 0.01 to make it small enough • b1 = np.zeros((2,1)) # its ok to have b as zero, it won't get us to the symmetry breaking 𝑎1 [1] 𝑥1 𝑎2 [1] 𝑥2 𝑦𝑎1 [2]
  • 61. 4.1 Deep L-layer neural network •Shallow NN is a NN with one or two layers. •Deep NN is a NN with three or more layers. •We will use the notation L to denote the number of layers in a NN. •n[l] is the number of neurons in a specific layer l. •n[0] denotes the number of neurons input layer. n[L] denotes the number of neurons in output layer. •g[l] is the activation function.
  • 62. 4.2 Forward Propagation in a Deep Network Forward propagation general rule for m inputs: •Z[l] = W[l]A[l-1] + B[l] •A[l] = g[l](A[l])
  • 63. 4.2.1 Matrix Dimensions •Dimension of W is (n[l],n[l-1]) . Can be thought by right to left. •Dimension of b is (n[l],1) •dw has the same shape as W, while db is the same shape as b •Dimension of Z[l], A[l], dZ[l], and dA[l] is (n[l],m)
  • 64. 4.3 Intuition about deep representation 𝑦
  • 65. 4.4 Parameters vs Hyperparameters • Main parameters of the NN is W and b • Hyper parameters (parameters that control the algorithm) are like: • Learning rate. • Number of iteration. • Number of hidden layers L. • Number of hidden units n. • Choice of activation functions. • You have to try values yourself of hyper parameters.
  • 67. 4.5 NN and The Human Brain ! •The analogy that "It is like the brain" has become really an oversimplified explanation. •There is a very simplistic analogy between a single logistic unit and a single neuron in the brain. •No human today understand how a human brain neuron works. •No human today know exactly how many neurons on the brain.

Editor's Notes

  • #65: Face recognition application: Image ==> Edges ==> Face parts ==> Faces ==> desired face Audio recognition application: Audio ==> Low level sound features like (sss,bb) ==> Phonemes ==> Words ==> Sentences