SlideShare a Scribd company logo
Multimedia Data Mining
using deep learning
Peter Wlodarczak
wlodarczak@gmail.com
Agenda
 Aims
 Multimedia Data Mining
 Artificial Neural Networks
 Deep learning
 Challenges
 Discussion
Aims
 Analyze multimedia data for:
 Object/face recognition
 Voice commands
 Natural Language Processing
 Classification
 Automatic caption generation
 Record linkage (entity resolution)
Multimedia Data Mining I
 Multimedia data mining:
 Unprecedented amount of Multimedia data
since Web 2.0 and Social Media
 Prosumer data
 Uses algorithms to extract useful patterns
and relations from image, audio and video
data
 Traditional methods often not satisfactory
 Unsuitable for high dimensionality
Multimedia Data Mining II
 Multimedia data mining has been
improved using deep learning in:
 Visual data mining
 Natural Language Processing
 Deep learner are:
 Machine Learning schemes
 Usually multi-layered artificial neural
networks
Artificial Neural Networks I
 Artificial Neural Networks:
 Suitable to give good approximations for
complex problems
 Consist of perceptrons, neurons,
and weighted connections,
the axons
Artificial Neural Networks II
 Perceptron (Neuron)
 Linear classifier
 Data linearly separable using a hyperplane
 Where w = weights, a = real-valued vector,
feature vector, a0 = bias
 Binary classifier f(a) that maps its input
vector a to a single, binary output value
w0a0 + w1a1 + w2a2 + … + wkak = 0
Artificial Neural Networks III
w0
1
bias
attr
a1
attr
a2
attr
a3
w1 w2
w3
f(a) = kwkak + b
f(a) > 0 or
f(a) < 0
Artificial Neural Networks III
Training data
sex mask cape tie ears smokes class
Batman male yes yes no yes no Good
Robin male yes yes no no no Good
Alfred male no no yes no no Good
Penguin male no no yes no yes Bad
Catwoman female yes no no yes no Bad
Joker male no no no no no Bad
Test data
Batgirl female yes yes no yes no ?
Riddler male yes no no no no ?
 Supervised learning
Artificial Neural Networks IV
 Not all data is linearly separable
Artificial Neural Networks V
 Multilayer Perceptron
 Perceptrons organized in several layers
 A layer is fully interconnected with the next
layer
 All nodes except input node are perceptrons
 Feedforward neural network
 Uses backpropagation for training
 Error propagated back to minimize loss function
Artificial Neural Networks VI
 Multilayer perceptron can be used for
non-linear, multiclass classification
Artificial Neural Networks VII
 Gradient descent optimization method
for learning weights
Artificial Neural Networks VIII
 Complexity has to be accurate
(Occam’s razor)
Schapire 2004
Artificial Neural Networks IX
Schapire 2004
Artificial Neural Networks X
 For building an accurate classifier:
 Enough training examples
 Good performance on training set
 Classifier that is not too complex,
overfitting
 Allows to get approximate solutions for
very complex problems
 Support Vector Machines (SVM) are a
much simpler alternative to ANN
Deep learning I
 Deep learning
 No clear distinction to shallow learner
 Multiple layers of non-linear processing
units
 Each layer represents features at a higher
level
 Forms a hierarchical representation
 Majority of deep learners are aNN
Deep learning II
 Deep learning neural networks
 Uses Rectified Linear Unit (ReLU)
 Learn faster
 Half-wave rectifier
f(z) = max(z, 0)
 Use backpropagation for adjusting the
weights
Deep learning III - ConvNet
LeNet 2015
Deep learning IV - ConvNet
 Convolutional neural networks
 Inspired by the animal visual cortex
 Visual cortex is the most powerful visual
processing system in existence
 Typically two stages:
 Convolutional stage
 Pooling stage
 Characterized by
 sparse connectivity
 shared weights
Deep learning V - ConvNet
 Shared weights
 Subsets share weights and bias to form
feature map
 Replicated across entire visual field
Deep learning VI - ConvNet
 Each layer accepts 3D input vector and
transforms it into a 3D output vector
 Filters activate when specific feature is
mapped
CS231n 2015
Deep learning VII - ConvNet
 Receptive field spans all feature maps
LeNet 2015
Deep learning VIII - ConvNet
 MaxPooling
 Non-linear down-sampling
 Partitions input into non-overlapping
rectangles
 Outputs maximum value for each sub-
region
 Minimizes computation for next layer
 Reduces dimensionality of intermediate
representations
Deep learning IX - ConvNet
 Convolutional and sampling sublayers
UFLDL 2015
Deep learning X - ConvNet
 Image cascading max-pooling with
convolutionary layer
 Similar to edge detector
Deep learning XI - RNN
 Recurrent neural networks
 Contain directed cycles
 Take sequences as input, no fixed size
input and output vectors, e. g. natural
speech
Deep learning XII - RNN
 No fixed size of computations
 Much simpler than ConvNets
 Maintain inner state exhibiting dynamic
temporal behavior
 Optimized through backpropagation
 Can be extended with long time memory
extensions
 Don’t necessary need sequences of inputs
Deep learning XIII - RNN
 Training RNN is a non-linear global
optimization problem
 Trained using stochastic gradient descent
 Non-linear, differentiable activation
function, e. g. rectifier
 Trained through backpropagation through
time (BPTT)
 Genetic algorithms can be used for training
Deep learning XIV - RNN
 Many different architectures for RNN
Elman SRN Spiking neural network
Deep learning XV - RNN
RNN learns to read house
numbers
RNN learns to paint
house numbers
Karpathy 2015
Deep learning XVI - RNN
 RNN used for
 Transcribe speech to text
 Voice synthetization
 Machine translation
Deep learning XVII
 Combining ConvNets and RNN for
image descriptions
 Regions described
using language as
label space using
ConvNet
 Language synthesizing
using RNN
Karpathy & Fei-Fei 2014
Deep learning XVIII
 ConvNet and RNN can be combined
 Automated caption generation
Deep learning XIX
 Automatic feature extraction
 No closed vocabulary set
 Alignment of segments of sentences to
region on the image
Karpathy & Fei-Fei 2014
Deep learning XX
 Other applications
 Object recognition
 Movie classification
 Handwriting recognition
 Record linkage
Challenges I
 Main disadvantage large volumes of
training data needed
 Overfitting if not enough training data
 Optimization difficult
 Finding relevant information
 Privacy preservice data mining
Challenges II
 Describing actions
Discussion
 Future research in
 Attention based models
 Finding relevant information
 Data democratization and Internet of
Things
 Unsupervised learning
 Semantic data modeling
 Reasoning
Thank you for the attention
 Questions?
References
 Zhao, X, Li, X & Zhang, Z 2015, 'Multimedia Retrieval via Deep Learning to Rank ', IEEE Signal Processing Letters, vol. 22, no. 9, pp. 1487 -
91 <http://ieeexplore.ieee.org.ezproxy.usq.edu.au/xpls/abs_all.jsp?arnumber=7054452>.
 Yu, W, Zhuang, F, He, Q & Shi, Z 2015, 'Learning deep representations via extreme learning machines', Neurocomputing, vol. 149, Part A,
pp. 308-15, <http://www.sciencedirect.com/science/article/pii/S0925231214011461>.
 Xu, K, Ba, J, Kiros, R, Cho, K, Courville, A, Salakhutdinov, R, Zemel, R & Bengio, Y 2015, 'Show, Attend and Tell: Neural Image Caption
Generation with Visual Attention', Proceedings of the 32nd International Conference on Machine Learning from Data: Artificial Intelligence
and Statistics, vol. 37.
 Xin, J, Wang, Z, Qu, L & Wang, G 2015, 'Elastic extreme learning machine for big data classification', Neurocomputing, vol. 149, Part A, pp.
464-71, <http://www.sciencedirect.com/science/article/pii/S0925231214011503>.
 Weston, J, Chopra, S & Bordes, A 2015, 'Memory Networks', in 3rd International Conference on Learning Representations: proceedings of
the3rd International Conference on Learning Representations San Diego, viewed <http://arxiv.org/pdf/1410.3916v10.pdf>.
 Weilong, H, Xinbo, G, Dacheng, T & Xuelong, L 2015, 'Blind Image Quality Assessment via Deep Learning', Neural Networks and Learning
Systems, IEEE Transactions on, vol. 26, no. 6, pp. 1275-86.
 Wang, Y, Li, D, Du, Y & Pan, Z 2015, 'Anomaly detection in traffic using L1-norm minimization extreme learning machine', Neurocomputing,
vol. 149, Part A, pp. 415-25, <http://www.sciencedirect.com/science/article/pii/S0925231214011382>.
 Vinyals, O, Toshev, A, Bengio, S & Erhan, D 2015, 'Show and Tell: A Neural Image Caption Generator', Google,
<http://arxiv.org/pdf/1411.4555v1.pdf>.
 Noda, K, Yamaguchi, Y, Nakadai, K, Okuno, H & Ogata, T 2015, 'Audio-visual speech recognition using deep learning', Applied Intelligence,
vol. 42, no. 4, pp. 722-37, <http://dx.doi.org/10.1007/s10489-014-0629-7>.
 Mao, W, Zhao, S, Mu, X & Wang, H 2015, 'Multi-dimensional extreme learning machine', Neurocomputing, vol. 149, Part A, pp. 160-70,
<http://www.sciencedirect.com/science/article/pii/S0925231214011540>.
 Liu, X, Wang, L, Huang, G-B, Zhang, J & Yin, J 2015, 'Multiple kernel extreme learning machine', Neurocomputing, vol. 149, Part A, pp. 253-
64, <http://www.sciencedirect.com/science/article/pii/S0925231214011199>.
 LeCun, Y, Bengio, Y & Hinton, G 2015, 'Deep learning', Nature, vol. 521, no. 7553, pp. 436-44, <http://dx.doi.org/10.1038/nature14539>.
 Srivastava, N, Hinton, G, Krizhevsky, A, Sutskever, I & Salakhutdinov, R 2014, 'Dropout: a simple way to prevent neural networks from
overfitting', J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929-58.
 Karpathy, A & Fei-Fei, L 2014, 'Deep visual-semantic alignments for generating image descriptions', arXiv preprint arXiv:1412.2306.

More Related Content

What's hot (20)

PDF
Deep Learning of High-Level Representations
Hamid Eghbal-zadeh
 
PDF
chalenges and apportunity of deep learning for big data analysis f
maru kindeneh
 
PDF
Introduction of Deep Learning
Myungjin Lee
 
PPTX
Deep learning presentation
Tunde Ajose-Ismail
 
PDF
The Multimodal Learning Analytics Pipeline
Daniele Di Mitri
 
PDF
Read Between The Lines: an Annotation Tool for Multimodal Data
Daniele Di Mitri
 
PDF
Deep learning 1.0 and Beyond, Part 1
Deakin University
 
PDF
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
Kato Mivule
 
PDF
Handwritten Recognition using Deep Learning with R
Poo Kuan Hoong
 
PPTX
Multimodal Tutor for CPR presented at AIME'19
Daniele Di Mitri
 
PPTX
Deep learning tutorial 9/2019
Amr Rashed
 
PDF
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Kato Mivule
 
PPTX
Deep Learning Explained
Melanie Swan
 
PPTX
Multilabel Image Retreval Using Hashing
Surbhi Bhosale
 
PPTX
GUI based handwritten digit recognition using CNN
Abhishek Tiwari
 
DOC
Proposed-curricula-MCSEwithSyllabus_24_...
butest
 
PPTX
Kato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule
 
PDF
Multispectral image analysis using random
ijsc
 
PPTX
BLIND RECOVERY OF DATA
Ajinkya Nikam
 
Deep Learning of High-Level Representations
Hamid Eghbal-zadeh
 
chalenges and apportunity of deep learning for big data analysis f
maru kindeneh
 
Introduction of Deep Learning
Myungjin Lee
 
Deep learning presentation
Tunde Ajose-Ismail
 
The Multimodal Learning Analytics Pipeline
Daniele Di Mitri
 
Read Between The Lines: an Annotation Tool for Multimodal Data
Daniele Di Mitri
 
Deep learning 1.0 and Beyond, Part 1
Deakin University
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
Kato Mivule
 
Handwritten Recognition using Deep Learning with R
Poo Kuan Hoong
 
Multimodal Tutor for CPR presented at AIME'19
Daniele Di Mitri
 
Deep learning tutorial 9/2019
Amr Rashed
 
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Kato Mivule
 
Deep Learning Explained
Melanie Swan
 
Multilabel Image Retreval Using Hashing
Surbhi Bhosale
 
GUI based handwritten digit recognition using CNN
Abhishek Tiwari
 
Proposed-curricula-MCSEwithSyllabus_24_...
butest
 
Kato Mivule - Towards Agent-based Data Privacy Engineering
Kato Mivule
 
Multispectral image analysis using random
ijsc
 
BLIND RECOVERY OF DATA
Ajinkya Nikam
 

Viewers also liked (20)

PPTX
Data mining
Akannsha Totewar
 
PPTX
Ppt buyonlineindia case study
GAURAV SHARMA
 
PPTX
ECML-2015 Presentation
Anirban Santara
 
PPTX
Eddl5131 assignment 1 march2013
gmorong
 
PDF
presentation
Shriman Narayan Tiwari
 
PDF
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES
Ramnandan Krishnamurthy
 
PPTX
Multimodal Learning Analytics
Xavier Ochoa
 
PDF
Multimodal Residual Learning for Visual Question-Answering
NAVER D2
 
PPTX
Multimodal deep learning
hoai_ln
 
PPTX
Introduction to un supervised learning
Rishikesh .
 
PPTX
Multimedia technology for websites
Fiona McGuire
 
PDF
Weekly news from WCUMC 11-9-2014
Woodinville Community Church
 
PPT
Voice over IP: Issues and Protocols
Videoguy
 
PPTX
MAT Chapter 1
IF Engineer 2
 
PPT
Image compression using singular value decomposition
PRADEEP Cheekatla
 
PDF
13584 27 multimedia mining
Universitas Bina Darma Palembang
 
PDF
CBIR by deep learning
Vigen Sahakyan
 
PDF
Deep Learning Primer - a brief introduction
ananth
 
PPTX
Multimedia
Ainun Syamila
 
PPT
Text mining, By Hadi Mohammadzadeh
Hadi Mohammadzadeh
 
Data mining
Akannsha Totewar
 
Ppt buyonlineindia case study
GAURAV SHARMA
 
ECML-2015 Presentation
Anirban Santara
 
Eddl5131 assignment 1 march2013
gmorong
 
REPRESENTATION LEARNING FOR STATE APPROXIMATION IN PLATFORM GAMES
Ramnandan Krishnamurthy
 
Multimodal Learning Analytics
Xavier Ochoa
 
Multimodal Residual Learning for Visual Question-Answering
NAVER D2
 
Multimodal deep learning
hoai_ln
 
Introduction to un supervised learning
Rishikesh .
 
Multimedia technology for websites
Fiona McGuire
 
Weekly news from WCUMC 11-9-2014
Woodinville Community Church
 
Voice over IP: Issues and Protocols
Videoguy
 
MAT Chapter 1
IF Engineer 2
 
Image compression using singular value decomposition
PRADEEP Cheekatla
 
13584 27 multimedia mining
Universitas Bina Darma Palembang
 
CBIR by deep learning
Vigen Sahakyan
 
Deep Learning Primer - a brief introduction
ananth
 
Multimedia
Ainun Syamila
 
Text mining, By Hadi Mohammadzadeh
Hadi Mohammadzadeh
 
Ad

Similar to Multimedia data mining using deep learning (20)

PDF
MLIP - Chapter 3 - Introduction to deep learning
Charles Deledalle
 
PDF
Deep Learning - Overview of my work II
Mohamed Loey
 
PDF
Tutorial on Deep Learning
inside-BigData.com
 
PDF
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
GeeksLab Odessa
 
PDF
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Universitat Politècnica de Catalunya
 
PPTX
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
MaRS Discovery District
 
PDF
Big Data Malaysia - A Primer on Deep Learning
Poo Kuan Hoong
 
PPTX
Computer vision lab seminar(deep learning) yong hoon
Yonghoon Kwon
 
PPTX
Deep Learning with Python (PyData Seattle 2015)
Alexander Korbonits
 
PPTX
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PyData
 
PPT
Introduction_to_DEEP_LEARNING.ppt
SwatiMahale4
 
PPT
Introduction_to_DEEP_LEARNING.sfsdafsadfsadfsdafsdppt
NaiduSetti
 
PPT
Introduction_to_DEEP_LEARNING ppt 101ppt
sathyanarayanakb1
 
PPT
Introduction_to_DEEP_LEARNING.ppt machine learning that uses data, loads ...
gkyenurkar
 
PPT
deeplearning
huda2018
 
PPTX
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...
Andrew Gardner
 
PPT
DEEP LEARNING PPT aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
RRamya22
 
PDF
DSRLab seminar Introduction to deep learning
Poo Kuan Hoong
 
PDF
34th.余凯.机器学习进展及语音图像中的应用
komunling
 
PPTX
Deep Learning for Artificial Intelligence (AI)
Er. Shiva K. Shrestha
 
MLIP - Chapter 3 - Introduction to deep learning
Charles Deledalle
 
Deep Learning - Overview of my work II
Mohamed Loey
 
Tutorial on Deep Learning
inside-BigData.com
 
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
GeeksLab Odessa
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Universitat Politècnica de Catalunya
 
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
MaRS Discovery District
 
Big Data Malaysia - A Primer on Deep Learning
Poo Kuan Hoong
 
Computer vision lab seminar(deep learning) yong hoon
Yonghoon Kwon
 
Deep Learning with Python (PyData Seattle 2015)
Alexander Korbonits
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PyData
 
Introduction_to_DEEP_LEARNING.ppt
SwatiMahale4
 
Introduction_to_DEEP_LEARNING.sfsdafsadfsadfsdafsdppt
NaiduSetti
 
Introduction_to_DEEP_LEARNING ppt 101ppt
sathyanarayanakb1
 
Introduction_to_DEEP_LEARNING.ppt machine learning that uses data, loads ...
gkyenurkar
 
deeplearning
huda2018
 
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...
Andrew Gardner
 
DEEP LEARNING PPT aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
RRamya22
 
DSRLab seminar Introduction to deep learning
Poo Kuan Hoong
 
34th.余凯.机器学习进展及语音图像中的应用
komunling
 
Deep Learning for Artificial Intelligence (AI)
Er. Shiva K. Shrestha
 
Ad

Recently uploaded (20)

PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PDF
Context Engineering vs. Prompt Engineering, A Comprehensive Guide.pdf
Tamanna
 
PPT
Lecture 2-1.ppt at a higher learning institution such as the university of Za...
rachealhantukumane52
 
DOCX
AI/ML Applications in Financial domain projects
Rituparna De
 
PDF
WEF_Future_of_Global_Fintech_Second_Edition_2025.pdf
AproximacionAlFuturo
 
PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
PPTX
Climate Action.pptx action plan for climate
justfortalabat
 
PPTX
Rocket-Launched-PowerPoint-Template.pptx
Arden31
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PPT
01 presentation finyyyal معهد معايره.ppt
eltohamym057
 
PDF
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
PDF
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
PPTX
recruitment Presentation.pptxhdhshhshshhehh
devraj40467
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PPTX
Resmed Rady Landis May 4th - analytics.pptx
Adrian Limanto
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
Context Engineering vs. Prompt Engineering, A Comprehensive Guide.pdf
Tamanna
 
Lecture 2-1.ppt at a higher learning institution such as the university of Za...
rachealhantukumane52
 
AI/ML Applications in Financial domain projects
Rituparna De
 
WEF_Future_of_Global_Fintech_Second_Edition_2025.pdf
AproximacionAlFuturo
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
Climate Action.pptx action plan for climate
justfortalabat
 
Rocket-Launched-PowerPoint-Template.pptx
Arden31
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
01 presentation finyyyal معهد معايره.ppt
eltohamym057
 
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
recruitment Presentation.pptxhdhshhshshhehh
devraj40467
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
Resmed Rady Landis May 4th - analytics.pptx
Adrian Limanto
 

Multimedia data mining using deep learning

  • 1. Multimedia Data Mining using deep learning Peter Wlodarczak [email protected]
  • 2. Agenda  Aims  Multimedia Data Mining  Artificial Neural Networks  Deep learning  Challenges  Discussion
  • 3. Aims  Analyze multimedia data for:  Object/face recognition  Voice commands  Natural Language Processing  Classification  Automatic caption generation  Record linkage (entity resolution)
  • 4. Multimedia Data Mining I  Multimedia data mining:  Unprecedented amount of Multimedia data since Web 2.0 and Social Media  Prosumer data  Uses algorithms to extract useful patterns and relations from image, audio and video data  Traditional methods often not satisfactory  Unsuitable for high dimensionality
  • 5. Multimedia Data Mining II  Multimedia data mining has been improved using deep learning in:  Visual data mining  Natural Language Processing  Deep learner are:  Machine Learning schemes  Usually multi-layered artificial neural networks
  • 6. Artificial Neural Networks I  Artificial Neural Networks:  Suitable to give good approximations for complex problems  Consist of perceptrons, neurons, and weighted connections, the axons
  • 7. Artificial Neural Networks II  Perceptron (Neuron)  Linear classifier  Data linearly separable using a hyperplane  Where w = weights, a = real-valued vector, feature vector, a0 = bias  Binary classifier f(a) that maps its input vector a to a single, binary output value w0a0 + w1a1 + w2a2 + … + wkak = 0
  • 8. Artificial Neural Networks III w0 1 bias attr a1 attr a2 attr a3 w1 w2 w3 f(a) = kwkak + b f(a) > 0 or f(a) < 0
  • 9. Artificial Neural Networks III Training data sex mask cape tie ears smokes class Batman male yes yes no yes no Good Robin male yes yes no no no Good Alfred male no no yes no no Good Penguin male no no yes no yes Bad Catwoman female yes no no yes no Bad Joker male no no no no no Bad Test data Batgirl female yes yes no yes no ? Riddler male yes no no no no ?  Supervised learning
  • 10. Artificial Neural Networks IV  Not all data is linearly separable
  • 11. Artificial Neural Networks V  Multilayer Perceptron  Perceptrons organized in several layers  A layer is fully interconnected with the next layer  All nodes except input node are perceptrons  Feedforward neural network  Uses backpropagation for training  Error propagated back to minimize loss function
  • 12. Artificial Neural Networks VI  Multilayer perceptron can be used for non-linear, multiclass classification
  • 13. Artificial Neural Networks VII  Gradient descent optimization method for learning weights
  • 14. Artificial Neural Networks VIII  Complexity has to be accurate (Occam’s razor) Schapire 2004
  • 15. Artificial Neural Networks IX Schapire 2004
  • 16. Artificial Neural Networks X  For building an accurate classifier:  Enough training examples  Good performance on training set  Classifier that is not too complex, overfitting  Allows to get approximate solutions for very complex problems  Support Vector Machines (SVM) are a much simpler alternative to ANN
  • 17. Deep learning I  Deep learning  No clear distinction to shallow learner  Multiple layers of non-linear processing units  Each layer represents features at a higher level  Forms a hierarchical representation  Majority of deep learners are aNN
  • 18. Deep learning II  Deep learning neural networks  Uses Rectified Linear Unit (ReLU)  Learn faster  Half-wave rectifier f(z) = max(z, 0)  Use backpropagation for adjusting the weights
  • 19. Deep learning III - ConvNet LeNet 2015
  • 20. Deep learning IV - ConvNet  Convolutional neural networks  Inspired by the animal visual cortex  Visual cortex is the most powerful visual processing system in existence  Typically two stages:  Convolutional stage  Pooling stage  Characterized by  sparse connectivity  shared weights
  • 21. Deep learning V - ConvNet  Shared weights  Subsets share weights and bias to form feature map  Replicated across entire visual field
  • 22. Deep learning VI - ConvNet  Each layer accepts 3D input vector and transforms it into a 3D output vector  Filters activate when specific feature is mapped CS231n 2015
  • 23. Deep learning VII - ConvNet  Receptive field spans all feature maps LeNet 2015
  • 24. Deep learning VIII - ConvNet  MaxPooling  Non-linear down-sampling  Partitions input into non-overlapping rectangles  Outputs maximum value for each sub- region  Minimizes computation for next layer  Reduces dimensionality of intermediate representations
  • 25. Deep learning IX - ConvNet  Convolutional and sampling sublayers UFLDL 2015
  • 26. Deep learning X - ConvNet  Image cascading max-pooling with convolutionary layer  Similar to edge detector
  • 27. Deep learning XI - RNN  Recurrent neural networks  Contain directed cycles  Take sequences as input, no fixed size input and output vectors, e. g. natural speech
  • 28. Deep learning XII - RNN  No fixed size of computations  Much simpler than ConvNets  Maintain inner state exhibiting dynamic temporal behavior  Optimized through backpropagation  Can be extended with long time memory extensions  Don’t necessary need sequences of inputs
  • 29. Deep learning XIII - RNN  Training RNN is a non-linear global optimization problem  Trained using stochastic gradient descent  Non-linear, differentiable activation function, e. g. rectifier  Trained through backpropagation through time (BPTT)  Genetic algorithms can be used for training
  • 30. Deep learning XIV - RNN  Many different architectures for RNN Elman SRN Spiking neural network
  • 31. Deep learning XV - RNN RNN learns to read house numbers RNN learns to paint house numbers Karpathy 2015
  • 32. Deep learning XVI - RNN  RNN used for  Transcribe speech to text  Voice synthetization  Machine translation
  • 33. Deep learning XVII  Combining ConvNets and RNN for image descriptions  Regions described using language as label space using ConvNet  Language synthesizing using RNN Karpathy & Fei-Fei 2014
  • 34. Deep learning XVIII  ConvNet and RNN can be combined  Automated caption generation
  • 35. Deep learning XIX  Automatic feature extraction  No closed vocabulary set  Alignment of segments of sentences to region on the image Karpathy & Fei-Fei 2014
  • 36. Deep learning XX  Other applications  Object recognition  Movie classification  Handwriting recognition  Record linkage
  • 37. Challenges I  Main disadvantage large volumes of training data needed  Overfitting if not enough training data  Optimization difficult  Finding relevant information  Privacy preservice data mining
  • 39. Discussion  Future research in  Attention based models  Finding relevant information  Data democratization and Internet of Things  Unsupervised learning  Semantic data modeling  Reasoning
  • 40. Thank you for the attention  Questions?
  • 41. References  Zhao, X, Li, X & Zhang, Z 2015, 'Multimedia Retrieval via Deep Learning to Rank ', IEEE Signal Processing Letters, vol. 22, no. 9, pp. 1487 - 91 <http://ieeexplore.ieee.org.ezproxy.usq.edu.au/xpls/abs_all.jsp?arnumber=7054452>.  Yu, W, Zhuang, F, He, Q & Shi, Z 2015, 'Learning deep representations via extreme learning machines', Neurocomputing, vol. 149, Part A, pp. 308-15, <http://www.sciencedirect.com/science/article/pii/S0925231214011461>.  Xu, K, Ba, J, Kiros, R, Cho, K, Courville, A, Salakhutdinov, R, Zemel, R & Bengio, Y 2015, 'Show, Attend and Tell: Neural Image Caption Generation with Visual Attention', Proceedings of the 32nd International Conference on Machine Learning from Data: Artificial Intelligence and Statistics, vol. 37.  Xin, J, Wang, Z, Qu, L & Wang, G 2015, 'Elastic extreme learning machine for big data classification', Neurocomputing, vol. 149, Part A, pp. 464-71, <http://www.sciencedirect.com/science/article/pii/S0925231214011503>.  Weston, J, Chopra, S & Bordes, A 2015, 'Memory Networks', in 3rd International Conference on Learning Representations: proceedings of the3rd International Conference on Learning Representations San Diego, viewed <http://arxiv.org/pdf/1410.3916v10.pdf>.  Weilong, H, Xinbo, G, Dacheng, T & Xuelong, L 2015, 'Blind Image Quality Assessment via Deep Learning', Neural Networks and Learning Systems, IEEE Transactions on, vol. 26, no. 6, pp. 1275-86.  Wang, Y, Li, D, Du, Y & Pan, Z 2015, 'Anomaly detection in traffic using L1-norm minimization extreme learning machine', Neurocomputing, vol. 149, Part A, pp. 415-25, <http://www.sciencedirect.com/science/article/pii/S0925231214011382>.  Vinyals, O, Toshev, A, Bengio, S & Erhan, D 2015, 'Show and Tell: A Neural Image Caption Generator', Google, <http://arxiv.org/pdf/1411.4555v1.pdf>.  Noda, K, Yamaguchi, Y, Nakadai, K, Okuno, H & Ogata, T 2015, 'Audio-visual speech recognition using deep learning', Applied Intelligence, vol. 42, no. 4, pp. 722-37, <http://dx.doi.org/10.1007/s10489-014-0629-7>.  Mao, W, Zhao, S, Mu, X & Wang, H 2015, 'Multi-dimensional extreme learning machine', Neurocomputing, vol. 149, Part A, pp. 160-70, <http://www.sciencedirect.com/science/article/pii/S0925231214011540>.  Liu, X, Wang, L, Huang, G-B, Zhang, J & Yin, J 2015, 'Multiple kernel extreme learning machine', Neurocomputing, vol. 149, Part A, pp. 253- 64, <http://www.sciencedirect.com/science/article/pii/S0925231214011199>.  LeCun, Y, Bengio, Y & Hinton, G 2015, 'Deep learning', Nature, vol. 521, no. 7553, pp. 436-44, <http://dx.doi.org/10.1038/nature14539>.  Srivastava, N, Hinton, G, Krizhevsky, A, Sutskever, I & Salakhutdinov, R 2014, 'Dropout: a simple way to prevent neural networks from overfitting', J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929-58.  Karpathy, A & Fei-Fei, L 2014, 'Deep visual-semantic alignments for generating image descriptions', arXiv preprint arXiv:1412.2306.