SlideShare a Scribd company logo
THE ART OF
INTELLIGENCE –
A PRACTICAL
INTRODUCTION
MACHINE LEARNING
FOR ORACLE
PROFESSIONALS
Lucas Jellema (CTO AMIS & Oracle ACE Director)
15th June 2017, TechExperience 2017, Amersfoort, The Netherlands
AGENDA
• What is Machine Learning?
• Why could it be relevant [to you]?
• What does it entail?
• With which algorithms, tools and technologies?
• Oracle and Machine Learning?
• How do you embark on Machine Learning?
LEARNING
• How do we learn?
• Try something (else) => get feedback => learn
• Eventually:
• We get it (understanding) so we can predict the outcome
of a certain action in a new situation
• Or we have experienced enough situations to predict
the outcome in most situations with high confidence
• Through interpolation, extrapolation, etc.
• We remain clueless
3
MACHINE LEARNING
• Analyze Historical Data (input and result – training set) to discover
Patterns & Models
• Iteratively apply Models to [additional] Input (test set) and compare
model outcome with known actual result to improve the model
• Use Model to predict
outcome for
entirely new data
4
WHY IS IT RELEVANT (NOW)?
• Data
• big, fast, open
• Machine Learning has become feasible
and accessible
• Available
• Affordable (software & hardware)
• Doable (Citizen Data Scientist)
• Fast enough
• Business Cases & Opportunities => Demands
• End users, Consumers, Competitive pressure, Society
WHY IS IT RELEVANT (NOW)?
EXAMPLE USE CASES
• Speech recognition
• Identify churn candidates
• Intent & Sentiment analysis on social media
• Upsell & Cross Sell
• Target Marketing
• Customer Service
• Chat bots & voice response systems
• Predictive Maintenance
• Gaming
• Captcha
• Medical Diagnosis
• Anomaly Detection (find the odd one out)
• Autonomous Cars
• Voter Segment Analysis
• Customer Recommendations
• Smart Data Capture
• Face Detection
• Fraud Prevention
• (really good) OCR
• Traffic light control
• Navigation
• Should we investigate | do lab test?
• Spam filtering
• Propose friends | contacts
• Troll detection
• Auto correct
• Photo Tagging and Album organization
THE DATA SCIENCE WORKFLOW
• Set Business Goal – research scope, objectives
• Gather data
• Prepare data
• Cleanse, transform (wrangle), combine (merge, enrich)
• Explore data
• Model Data
• Select model, train model, test model
• Present findings and recommend next steps
• Apply:
• Make use of insights in business decisions
• Automate Data Gathering & Preparation, Deploy Model, Embed Model in
operational systems
DATA DISCOVERY
9
A B C D E F G
1104534 ZTR 0.1 anijs 2 36 T
631148 ESE 132 rivier 0 21 S
-3 WGN 71 appel 0 1 -
1262300 ZTR 56 zes 2 41 T
315529 HVN 1290 hamer 0 11 -
788914 ASM 676 zwaluw 0 26 T
157762 HVN 9482 wie 0 6 -
946681 DHG 42 rond 1 31 T
-31539 WGN 2423 bruin 0 0 -
47338 HVN 54 hamer 0 16 P
SCATTER PLOT
ATTRIBUTE F (Y-AXIS)VS ATTRIBUTE A
10
0
5
10
15
20
25
30
35
40
45
-200000 0 200000 400000 600000 800000 1000000 1200000 1400000
Y-Values
Y-Values
SCATTER PLOT
ATTRIBUTE F (Y-AXIS)VS ATTRIBUTE A
11
0
5
10
15
20
25
30
35
40
45
1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
Age of Lucas Jellema vs Year
Y-Values
DATA DISCOVERY – ATTRIBUTES IDENTIFIED
12
Time City - - #Kids Age Level of
Education
1104534 ZTR 0.1 anijs 2 36 T
631148 ESE 132 rivier 0 21 S
-3 WGN 71 appel 0 1 -
1262300 ZTR 56 zes 2 41 T
315529 HVN 1290 hamer 0 11 -
788914 ASM 676 zwaluw 0 26 T
157762 HVN 9482 wie 0 6 -
946681 DHG 42 rond 1 31 T
-31539 WGN 2423 bruin 0 0 -
47338 HVN 54 hamer 0 16 P
TYPES OF MACHINE LEARNING
• Supervised
• Train and test model from known data (both features and target)
• Unsupervised
• Analyze unlabeled data – see if you can find anything
• Semi-Supervised
• Interactive flow, for example human identifying clusters
• Reinforcement
• Continuously improve algorithm (model) as time progresses, based on new
experience
MACHINE LEARNING ALGORITHMS
• Clustering
• Hierarchical k-means, Orthogonal Partitioning Clustering, Expectation-Maximization
• Feature Extraction/Attribute Importance/Principal Component Analysis
• Classification
• Decision Tree, Naïve Bayes, Random Forest, Logistic Regression, Support Vector Machine
• Regression
• Multiple Regression, Support Vector Machine, Linear Model, LASSO,
Random Forest, Ridgre Regression, Generalized Linear Model,
Stepwise Linear Regression
• Association & Collaborative Filtering (market basket analysis
, apriori)
• Neural network and Deep Learning with
Deep Neural Network
• Can be used for many different use cases
MODELING PHASE
• Select a model to try to create a fit with (predict target well)
• Set configuration parameters for model
• Divide data in training set and test set
• Train model with training set
• Evaluate performance of trained model on the test set
• Confusion matrix, mean square error, support, lift, false positives, false negatives
• Optionally: tweak model parameters, add attributes, feed in more training
data, choose different model
• Eventually (hopefully): pick model plus parameters plus attributes
that will reliably predict the target variable given new data
OPTICAL DIGIT RECOGNITION
Predicted
Actual
0 1 2 3 4 5 6 7 8 9
0
1
2
3
4
5
6
7
8
9
Naïve Bayes
Decision Tree
CLASSIFICATION GONE WRONG
• Machine learning applied to millions of drawings
on QuickDraw
• to classify drawings
• For example: drawings of beds
• See for example:
• https://aiexperiments.withgoogle.com/quick-draw
MACHINE LEARNING  OPERATIONAL
SYSTEMS
• “We have a model that will choose best chess move based on
certain input”
MACHINE LEARNING  OPERATIONAL
SYSTEMS
• Discovery => Model => Deploy
• “We have a model that will predict a class (classification) or value
(regression) based on certain input with a meaningful degree of
accuracy” – how can we make use of that model?
DEPLOY MODEL AND EXPOSE
• Model is usually created on Big Data in Data Science environment using the
Data Scientist’s tools
• Model itself is typically fairly small
• Model will be applied in operational systems against single data items (not
huge collections nor the entire Big Data set)
• Running the model online may not require extensive resources
• Implementing the model at production run time
• Export model (from Data Scientist environment) and import (into production
environment)
• Reimplement the model in the development technology and deploy (in the regular
way) to the production environment
• Expose model through API
DEPLOY MODEL AND EXPOSE
REST
API
MODEL MANAGEMENT
• Governance (new versions, testing and approval)
• A/B testing
• Auditing (what did the model decide and why? notifying humans? )
• Evaluation (how well did the model’s output match the reality)
to help evolve the model
• for example recommendations followed
• Monitor self learning models (to detect rogue models)
DEPLOYMENT CAN ALSO BE:
LOAD RESULTS FROM MODEL INTO PRODUCTION
WHAT TO DO IT WITH?
• Mathematics (Statistics)
• Gauss (normal distribution)
• Bayes’ Theorem
• Euclidean Distance
• Perceptron
• Mean Square Error
WHAT TO DO IT WITH?
HOW TO PICK TOOLS FOR THE JOB
• What are the jobs?
• Gather data
• Prepare data
• Explore and (hopefully) Discover
• Present
• Embed & Deploy Model
• What are considerations?
• Volume
• Speed and Time
• Skills
• Platform
POPULAR TOOLS
NOTEBOOK –
THE LAB JOURNAL FROM THE DATALAB
• Common format for data exploration and presentation
• User friendly interface on top of powerful technologies
• Most popular implementations
• Jupyter (fka IPython)
• Apache Zeppelin
• Spark Notebook
• Beaker
• SageMath (SageMathCloud => CoCalc)
• Oracle Machine Learning Notebook UI
EXAMPLE NOTEBOOK EXPLORATION
OPEN DATA
• Governments and NGOs, scientific and even commercial
organizations are publishing data
• Inviting anyone who wants to join in to help make
sense of the data – understand driving factors,
identify categories, help predict
• Many areas
• Economy, health, public safety, sports, traffic &
transportation, games, environment, maps, …
OPEN DATA – SOME EXAMPLES
• Kaggle - Data Sets and [Samples of] Data Discovery: www.kaggle.com
• US, EU and UK Government Data: data.gov, open-data.europa.eu and data.gov.uk
• Dutch Government Data: data.overheid.nl (plus CBS, RDW, individual cities, …)
• Open Images Data Set: www.image-net.org
• Open Data From World Bank: data.worldbank.org
• Historic Football Data: api.football-data.org
• Detroit Open Data Portal: data.detroitmi.gov
• Airports, Airlines, Flight Routes: openflights.org
• Open Database – machine counterpart to Wikipedia: www.wikidata.org
• Google Audio Set (manually annotated audio events) - research.google.com/audioset/
• Movielens - Movies, viewers and ratings: files.grouplens.org/datasets/movielens/
WHAT IS HADOOP?
• Big Data means Big Computing and Big Storage
• Big requires scalable => horizontal scale out
• Moving data is very expensive (network, disk IO)
• Rather than move data to processor – move processing to data: distributed
processing
• Horizontal scale out => Hadoop:
distributed data & distributed processing
• HDFS – Hadoop Distributed File System
• Map Reduce – parallel, distributed processing
• Map-Reduce operates on data locally, then
persists and aggregates results
WHAT IS SPARK?
• Developing and orchestrating Map-Reduce on Hadoop is not simple
• Running jobs can be slow due to frequent disk writing
• Spark is for managing and orchestrating distributed processing on a
variety of cluster systems
• with Hadoop as the most obvious target
• through APIs in Java, Python, R, Scala
• Spark uses lazy operations and distributed in-memory data
structures – offering much better performance
• Through Spark – cluster based processing can be used interactively
• Spark has additional modules that leverage distributed
processing for running prepackaged jobs (SQL, Graph, ML, …)
APACHE SPARK OVERVIEW
EXAMPLE RUNNING AGAINST SPARK
• https://github.com/jadianes/spark-movie-lens/blob/master/notebooks/building-recommender.ipynb
WHAT IS ORACLE DOING AROUND
MACHINE LEARNING?
• Oracle Advanced Analytics in Oracle Database
• Data Mining, Enterprise R
• Text (ESA), Spatial, Graph
• SQL
DEMONSTRATION OF ORACLE ADVANCED
ANALYTICS
• Using Text Mining and Naives Bayes Data Mining Classification
• Train model for classifying conference abstracts into tracks
• Use model to propose a track for new abstracts
• Steps
• Gather data
• Import, cleanse, enrich, …
• Prepare training set and test set
• Select and configure model
• Combining Text and Mining using Naive Bayes
• Train model
• Test and apply model
BIG DATA SQL
ORACLE DATABASE AS SINGLE POINT OF ENTRY
MANY CLOUD SERVICES AROUND BIG DATA &
[PREDICTIVE] ANALYTICS & MACHINE LEARNING
39
WHAT IS ORACLE DOING AROUND
MACHINE LEARNING?
• Big Data Discovery (fka Endeca) and BDD CS
• Big Data Appliance
• Data Vizualization Cloud
• Analytics Clouds (Sales, Marketing, HCM) on top of SaaS
• RTD – Real Time Decisions
• DaaS
• Oracle Labs (labs.oracle.com)
• Machine Learning Research Group (link)
• Machine Learning CS – “Oracle Notebook”
The Art of Intelligence – A Practical Introduction Machine Learning for Oracle professionals (NLOUG Tech Experience 2017)
The Art of Intelligence – A Practical Introduction Machine Learning for Oracle professionals (NLOUG Tech Experience 2017)
The Art of Intelligence – A Practical Introduction Machine Learning for Oracle professionals (NLOUG Tech Experience 2017)
The Art of Intelligence – A Practical Introduction Machine Learning for Oracle professionals (NLOUG Tech Experience 2017)
HUMANS LEARNING MACHINE LEARNING:
YOUR FIRST STEPS
• Jupyter Notebooks and Python – tmpnb.org
• HortonWorks Sandbox VM – Hadoop & Spark & Hive, Ambari
• DataBricks Cloud Environment with Apache Spark (free trial)
• Oracle Big Data Lite – Prebuilt Virtual Machine
• Tutorials, Courses (Udacity, Coursera, edX)
• Books
• Introducing Data Science
• Learning Apache Spark 2
• Python Machine Learning
SUMMARY
• Machine Learning by computers helps us(ers) understand historic data and
apply that insight to new data
• Through smart algorithms, advanced software and cheap and powerful compute
and storage resources, Machine Learning is accessible to all
• R and Python are most popular technologies for data exploration and ML model
discovery
• Apache Spark (on Hadoop) is frequently used to powercrunch data (wrangling)
and run ML models on Big Data sets
• Notebooks are a popular vehicle in the Data Science lab
• To explore and report
• Oracle researches, applies and exposes ML (Big Data SQL, OAA, OPC)
• Getting started on Machine Learning is fun, smart and well supported
• Blog: technology.amis.nl
• Email: lucas.jellema@amis.nl
• : lucasjellema
• : lucas-jellema
• : www.amis.nl, info@amis.nl
+31 306016000
Edisonbaan 15,
Nieuwegein

More Related Content

What's hot (20)

PPTX
Systems on the edge - your stepping stones into Oracle Public PaaS Cloud - AM...
Lucas Jellema
 
PDF
Agile infrastructure
Tarun Rajput
 
PDF
Cloud Native Camel Riding
Christian Posta
 
PDF
Stay productive_while_slicing_up_the_monolith
Markus Eisele
 
PDF
Chicago Microservices Integration Talk
Christian Posta
 
PPTX
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...
Lucas Jellema
 
PDF
Oracle WebLogic 12c New Multitenancy features
Michel Schildmeijer
 
PPTX
Why real integration developers ride Camels
Christian Posta
 
PDF
REST - Why, When and How? at AMIS25
Jon Petter Hjulstad
 
PPTX
Essential Camel Components
Christian Posta
 
PPTX
Oracle Management Cloud - introduction, overview and getting started (AMIS, 2...
Lucas Jellema
 
PPTX
Live Application and Infrastructure Monitoring and Root Cause Log Analysis wi...
Lucas Jellema
 
PDF
Fuse integration-services
Christian Posta
 
PDF
Gradual migration to MicroProfile
Rudy De Busscher
 
PPT
Flying to clouds - can it be easy? Cloud Native Applications
Jacek Bukowski
 
PPTX
Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...
Lucas Jellema
 
PDF
Docker in the Enterprise
Saul Caganoff
 
PPT
JDD 2016 - Jacek Bukowski - "Flying To Clouds" - Can It Be Easy?
PROIDEA
 
PPTX
A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0
Legacy Typesafe (now Lightbend)
 
PPTX
Oracle OpenWorld 2014 Review Part Four - PaaS Middleware
Getting value from IoT, Integration and Data Analytics
 
Systems on the edge - your stepping stones into Oracle Public PaaS Cloud - AM...
Lucas Jellema
 
Agile infrastructure
Tarun Rajput
 
Cloud Native Camel Riding
Christian Posta
 
Stay productive_while_slicing_up_the_monolith
Markus Eisele
 
Chicago Microservices Integration Talk
Christian Posta
 
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...
Lucas Jellema
 
Oracle WebLogic 12c New Multitenancy features
Michel Schildmeijer
 
Why real integration developers ride Camels
Christian Posta
 
REST - Why, When and How? at AMIS25
Jon Petter Hjulstad
 
Essential Camel Components
Christian Posta
 
Oracle Management Cloud - introduction, overview and getting started (AMIS, 2...
Lucas Jellema
 
Live Application and Infrastructure Monitoring and Root Cause Log Analysis wi...
Lucas Jellema
 
Fuse integration-services
Christian Posta
 
Gradual migration to MicroProfile
Rudy De Busscher
 
Flying to clouds - can it be easy? Cloud Native Applications
Jacek Bukowski
 
Soaring through the Clouds –Live Demo of Setting a World Record in Integratin...
Lucas Jellema
 
Docker in the Enterprise
Saul Caganoff
 
JDD 2016 - Jacek Bukowski - "Flying To Clouds" - Can It Be Easy?
PROIDEA
 
A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0
Legacy Typesafe (now Lightbend)
 
Oracle OpenWorld 2014 Review Part Four - PaaS Middleware
Getting value from IoT, Integration and Data Analytics
 

Similar to The Art of Intelligence – A Practical Introduction Machine Learning for Oracle professionals (NLOUG Tech Experience 2017) (20)

PPTX
Introduction overviewmachinelearning sig Door Lucas Jellema
Getting value from IoT, Integration and Data Analytics
 
PPTX
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...
Lucas Jellema
 
PPTX
Introduction to Machine Learning - An overview and first step for candidate d...
Lucas Jellema
 
PPTX
The Art of Intelligence – Introduction Machine Learning for Oracle profession...
Lucas Jellema
 
PPTX
The Art of Intelligence – Introduction Machine Learning for Java professional...
Lucas Jellema
 
PDF
Introduction to Mahout and Machine Learning
Varad Meru
 
PDF
Data Scientist Toolbox
Andrei Savu
 
PPTX
Azure machine learning tech mela
Yogendra Tamang
 
PPTX
machine learning
soundaryasarya
 
PPTX
Introduction to Machine Learning
Knowledge And Skill Forum
 
PDF
Machine Learning for (JVM) Developers
Mateusz Dymczyk
 
PPTX
AzureML – zero to hero
Govind Kanshi
 
PPTX
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Robert Williams
 
PDF
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Ali Alkan
 
PPTX
InfoEducatie - Face Recognition Architecture
Bogdan Bocse
 
PPTX
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
VisageCloud
 
PPTX
Introduction to Machine learning
NEEVEE Technologies
 
PPT
Intro.ppt
Anonymous9etQKwW
 
PPT
Intro_2.ppt
MumitAhmed1
 
PPT
Intro.ppt
SharabiNaif
 
Introduction overviewmachinelearning sig Door Lucas Jellema
Getting value from IoT, Integration and Data Analytics
 
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...
Lucas Jellema
 
Introduction to Machine Learning - An overview and first step for candidate d...
Lucas Jellema
 
The Art of Intelligence – Introduction Machine Learning for Oracle profession...
Lucas Jellema
 
The Art of Intelligence – Introduction Machine Learning for Java professional...
Lucas Jellema
 
Introduction to Mahout and Machine Learning
Varad Meru
 
Data Scientist Toolbox
Andrei Savu
 
Azure machine learning tech mela
Yogendra Tamang
 
machine learning
soundaryasarya
 
Introduction to Machine Learning
Knowledge And Skill Forum
 
Machine Learning for (JVM) Developers
Mateusz Dymczyk
 
AzureML – zero to hero
Govind Kanshi
 
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Robert Williams
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Ali Alkan
 
InfoEducatie - Face Recognition Architecture
Bogdan Bocse
 
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
VisageCloud
 
Introduction to Machine learning
NEEVEE Technologies
 
Intro.ppt
Anonymous9etQKwW
 
Intro_2.ppt
MumitAhmed1
 
Intro.ppt
SharabiNaif
 
Ad

More from Lucas Jellema (20)

PPTX
Introduction to web application development with Vue (for absolute beginners)...
Lucas Jellema
 
PPTX
Making the Shift Left - Bringing Ops to Dev before bringing applications to p...
Lucas Jellema
 
PPTX
Lightweight coding in powerful Cloud Development Environments (DigitalXchange...
Lucas Jellema
 
PPTX
Apache Superset - open source data exploration and visualization (Conclusion ...
Lucas Jellema
 
PPTX
CONNECTING THE REAL WORLD TO ENTERPRISE IT – HOW IoT DRIVES OUR ENERGY TRANSI...
Lucas Jellema
 
PPTX
Help me move away from Oracle - or not?! (Oracle Community Tour EMEA - LVOUG...
Lucas Jellema
 
PPTX
Op je vingers tellen... tot 1000!
Lucas Jellema
 
PPTX
IoT - from prototype to enterprise platform (DigitalXchange 2022)
Lucas Jellema
 
PPTX
Who Wants to Become an IT Architect-A Look at the Bigger Picture - DigitalXch...
Lucas Jellema
 
PPTX
Steampipe - use SQL to retrieve data from cloud, platforms and files (Code Ca...
Lucas Jellema
 
PPTX
Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...
Lucas Jellema
 
PPTX
Introducing Dapr.io - the open source personal assistant to microservices and...
Lucas Jellema
 
PPTX
How and Why you can and should Participate in Open Source Projects (AMIS, Sof...
Lucas Jellema
 
PPTX
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Lucas Jellema
 
PPTX
Microservices, Node, Dapr and more - Part One (Fontys Hogeschool, Spring 2022)
Lucas Jellema
 
PPTX
6Reinventing Oracle Systems in a Cloudy World (RMOUG Trainingdays, February 2...
Lucas Jellema
 
PPTX
Help me move away from Oracle! (RMOUG Training Days 2022, February 2022)
Lucas Jellema
 
PPTX
Tech Talks 101 - DevOps (jan 2022)
Lucas Jellema
 
PPTX
Conclusion Code Cafe - Microcks for Mocking and Testing Async APIs (January 2...
Lucas Jellema
 
PPTX
Cloud Native Application Development - build fast, low TCO, scalable & agile ...
Lucas Jellema
 
Introduction to web application development with Vue (for absolute beginners)...
Lucas Jellema
 
Making the Shift Left - Bringing Ops to Dev before bringing applications to p...
Lucas Jellema
 
Lightweight coding in powerful Cloud Development Environments (DigitalXchange...
Lucas Jellema
 
Apache Superset - open source data exploration and visualization (Conclusion ...
Lucas Jellema
 
CONNECTING THE REAL WORLD TO ENTERPRISE IT – HOW IoT DRIVES OUR ENERGY TRANSI...
Lucas Jellema
 
Help me move away from Oracle - or not?! (Oracle Community Tour EMEA - LVOUG...
Lucas Jellema
 
Op je vingers tellen... tot 1000!
Lucas Jellema
 
IoT - from prototype to enterprise platform (DigitalXchange 2022)
Lucas Jellema
 
Who Wants to Become an IT Architect-A Look at the Bigger Picture - DigitalXch...
Lucas Jellema
 
Steampipe - use SQL to retrieve data from cloud, platforms and files (Code Ca...
Lucas Jellema
 
Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...
Lucas Jellema
 
Introducing Dapr.io - the open source personal assistant to microservices and...
Lucas Jellema
 
How and Why you can and should Participate in Open Source Projects (AMIS, Sof...
Lucas Jellema
 
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...
Lucas Jellema
 
Microservices, Node, Dapr and more - Part One (Fontys Hogeschool, Spring 2022)
Lucas Jellema
 
6Reinventing Oracle Systems in a Cloudy World (RMOUG Trainingdays, February 2...
Lucas Jellema
 
Help me move away from Oracle! (RMOUG Training Days 2022, February 2022)
Lucas Jellema
 
Tech Talks 101 - DevOps (jan 2022)
Lucas Jellema
 
Conclusion Code Cafe - Microcks for Mocking and Testing Async APIs (January 2...
Lucas Jellema
 
Cloud Native Application Development - build fast, low TCO, scalable & agile ...
Lucas Jellema
 
Ad

Recently uploaded (20)

PDF
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PPTX
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PPTX
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PPTX
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 

The Art of Intelligence – A Practical Introduction Machine Learning for Oracle professionals (NLOUG Tech Experience 2017)

  • 1. THE ART OF INTELLIGENCE – A PRACTICAL INTRODUCTION MACHINE LEARNING FOR ORACLE PROFESSIONALS Lucas Jellema (CTO AMIS & Oracle ACE Director) 15th June 2017, TechExperience 2017, Amersfoort, The Netherlands
  • 2. AGENDA • What is Machine Learning? • Why could it be relevant [to you]? • What does it entail? • With which algorithms, tools and technologies? • Oracle and Machine Learning? • How do you embark on Machine Learning?
  • 3. LEARNING • How do we learn? • Try something (else) => get feedback => learn • Eventually: • We get it (understanding) so we can predict the outcome of a certain action in a new situation • Or we have experienced enough situations to predict the outcome in most situations with high confidence • Through interpolation, extrapolation, etc. • We remain clueless 3
  • 4. MACHINE LEARNING • Analyze Historical Data (input and result – training set) to discover Patterns & Models • Iteratively apply Models to [additional] Input (test set) and compare model outcome with known actual result to improve the model • Use Model to predict outcome for entirely new data 4
  • 5. WHY IS IT RELEVANT (NOW)? • Data • big, fast, open • Machine Learning has become feasible and accessible • Available • Affordable (software & hardware) • Doable (Citizen Data Scientist) • Fast enough • Business Cases & Opportunities => Demands • End users, Consumers, Competitive pressure, Society
  • 6. WHY IS IT RELEVANT (NOW)?
  • 7. EXAMPLE USE CASES • Speech recognition • Identify churn candidates • Intent & Sentiment analysis on social media • Upsell & Cross Sell • Target Marketing • Customer Service • Chat bots & voice response systems • Predictive Maintenance • Gaming • Captcha • Medical Diagnosis • Anomaly Detection (find the odd one out) • Autonomous Cars • Voter Segment Analysis • Customer Recommendations • Smart Data Capture • Face Detection • Fraud Prevention • (really good) OCR • Traffic light control • Navigation • Should we investigate | do lab test? • Spam filtering • Propose friends | contacts • Troll detection • Auto correct • Photo Tagging and Album organization
  • 8. THE DATA SCIENCE WORKFLOW • Set Business Goal – research scope, objectives • Gather data • Prepare data • Cleanse, transform (wrangle), combine (merge, enrich) • Explore data • Model Data • Select model, train model, test model • Present findings and recommend next steps • Apply: • Make use of insights in business decisions • Automate Data Gathering & Preparation, Deploy Model, Embed Model in operational systems
  • 9. DATA DISCOVERY 9 A B C D E F G 1104534 ZTR 0.1 anijs 2 36 T 631148 ESE 132 rivier 0 21 S -3 WGN 71 appel 0 1 - 1262300 ZTR 56 zes 2 41 T 315529 HVN 1290 hamer 0 11 - 788914 ASM 676 zwaluw 0 26 T 157762 HVN 9482 wie 0 6 - 946681 DHG 42 rond 1 31 T -31539 WGN 2423 bruin 0 0 - 47338 HVN 54 hamer 0 16 P
  • 10. SCATTER PLOT ATTRIBUTE F (Y-AXIS)VS ATTRIBUTE A 10 0 5 10 15 20 25 30 35 40 45 -200000 0 200000 400000 600000 800000 1000000 1200000 1400000 Y-Values Y-Values
  • 11. SCATTER PLOT ATTRIBUTE F (Y-AXIS)VS ATTRIBUTE A 11 0 5 10 15 20 25 30 35 40 45 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 Age of Lucas Jellema vs Year Y-Values
  • 12. DATA DISCOVERY – ATTRIBUTES IDENTIFIED 12 Time City - - #Kids Age Level of Education 1104534 ZTR 0.1 anijs 2 36 T 631148 ESE 132 rivier 0 21 S -3 WGN 71 appel 0 1 - 1262300 ZTR 56 zes 2 41 T 315529 HVN 1290 hamer 0 11 - 788914 ASM 676 zwaluw 0 26 T 157762 HVN 9482 wie 0 6 - 946681 DHG 42 rond 1 31 T -31539 WGN 2423 bruin 0 0 - 47338 HVN 54 hamer 0 16 P
  • 13. TYPES OF MACHINE LEARNING • Supervised • Train and test model from known data (both features and target) • Unsupervised • Analyze unlabeled data – see if you can find anything • Semi-Supervised • Interactive flow, for example human identifying clusters • Reinforcement • Continuously improve algorithm (model) as time progresses, based on new experience
  • 14. MACHINE LEARNING ALGORITHMS • Clustering • Hierarchical k-means, Orthogonal Partitioning Clustering, Expectation-Maximization • Feature Extraction/Attribute Importance/Principal Component Analysis • Classification • Decision Tree, Naïve Bayes, Random Forest, Logistic Regression, Support Vector Machine • Regression • Multiple Regression, Support Vector Machine, Linear Model, LASSO, Random Forest, Ridgre Regression, Generalized Linear Model, Stepwise Linear Regression • Association & Collaborative Filtering (market basket analysis , apriori) • Neural network and Deep Learning with Deep Neural Network • Can be used for many different use cases
  • 15. MODELING PHASE • Select a model to try to create a fit with (predict target well) • Set configuration parameters for model • Divide data in training set and test set • Train model with training set • Evaluate performance of trained model on the test set • Confusion matrix, mean square error, support, lift, false positives, false negatives • Optionally: tweak model parameters, add attributes, feed in more training data, choose different model • Eventually (hopefully): pick model plus parameters plus attributes that will reliably predict the target variable given new data
  • 16. OPTICAL DIGIT RECOGNITION Predicted Actual 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Naïve Bayes Decision Tree
  • 17. CLASSIFICATION GONE WRONG • Machine learning applied to millions of drawings on QuickDraw • to classify drawings • For example: drawings of beds • See for example: • https://aiexperiments.withgoogle.com/quick-draw
  • 18. MACHINE LEARNING  OPERATIONAL SYSTEMS • “We have a model that will choose best chess move based on certain input”
  • 19. MACHINE LEARNING  OPERATIONAL SYSTEMS • Discovery => Model => Deploy • “We have a model that will predict a class (classification) or value (regression) based on certain input with a meaningful degree of accuracy” – how can we make use of that model?
  • 20. DEPLOY MODEL AND EXPOSE • Model is usually created on Big Data in Data Science environment using the Data Scientist’s tools • Model itself is typically fairly small • Model will be applied in operational systems against single data items (not huge collections nor the entire Big Data set) • Running the model online may not require extensive resources • Implementing the model at production run time • Export model (from Data Scientist environment) and import (into production environment) • Reimplement the model in the development technology and deploy (in the regular way) to the production environment • Expose model through API
  • 21. DEPLOY MODEL AND EXPOSE REST API
  • 22. MODEL MANAGEMENT • Governance (new versions, testing and approval) • A/B testing • Auditing (what did the model decide and why? notifying humans? ) • Evaluation (how well did the model’s output match the reality) to help evolve the model • for example recommendations followed • Monitor self learning models (to detect rogue models)
  • 23. DEPLOYMENT CAN ALSO BE: LOAD RESULTS FROM MODEL INTO PRODUCTION
  • 24. WHAT TO DO IT WITH? • Mathematics (Statistics) • Gauss (normal distribution) • Bayes’ Theorem • Euclidean Distance • Perceptron • Mean Square Error
  • 25. WHAT TO DO IT WITH?
  • 26. HOW TO PICK TOOLS FOR THE JOB • What are the jobs? • Gather data • Prepare data • Explore and (hopefully) Discover • Present • Embed & Deploy Model • What are considerations? • Volume • Speed and Time • Skills • Platform
  • 28. NOTEBOOK – THE LAB JOURNAL FROM THE DATALAB • Common format for data exploration and presentation • User friendly interface on top of powerful technologies • Most popular implementations • Jupyter (fka IPython) • Apache Zeppelin • Spark Notebook • Beaker • SageMath (SageMathCloud => CoCalc) • Oracle Machine Learning Notebook UI
  • 30. OPEN DATA • Governments and NGOs, scientific and even commercial organizations are publishing data • Inviting anyone who wants to join in to help make sense of the data – understand driving factors, identify categories, help predict • Many areas • Economy, health, public safety, sports, traffic & transportation, games, environment, maps, …
  • 31. OPEN DATA – SOME EXAMPLES • Kaggle - Data Sets and [Samples of] Data Discovery: www.kaggle.com • US, EU and UK Government Data: data.gov, open-data.europa.eu and data.gov.uk • Dutch Government Data: data.overheid.nl (plus CBS, RDW, individual cities, …) • Open Images Data Set: www.image-net.org • Open Data From World Bank: data.worldbank.org • Historic Football Data: api.football-data.org • Detroit Open Data Portal: data.detroitmi.gov • Airports, Airlines, Flight Routes: openflights.org • Open Database – machine counterpart to Wikipedia: www.wikidata.org • Google Audio Set (manually annotated audio events) - research.google.com/audioset/ • Movielens - Movies, viewers and ratings: files.grouplens.org/datasets/movielens/
  • 32. WHAT IS HADOOP? • Big Data means Big Computing and Big Storage • Big requires scalable => horizontal scale out • Moving data is very expensive (network, disk IO) • Rather than move data to processor – move processing to data: distributed processing • Horizontal scale out => Hadoop: distributed data & distributed processing • HDFS – Hadoop Distributed File System • Map Reduce – parallel, distributed processing • Map-Reduce operates on data locally, then persists and aggregates results
  • 33. WHAT IS SPARK? • Developing and orchestrating Map-Reduce on Hadoop is not simple • Running jobs can be slow due to frequent disk writing • Spark is for managing and orchestrating distributed processing on a variety of cluster systems • with Hadoop as the most obvious target • through APIs in Java, Python, R, Scala • Spark uses lazy operations and distributed in-memory data structures – offering much better performance • Through Spark – cluster based processing can be used interactively • Spark has additional modules that leverage distributed processing for running prepackaged jobs (SQL, Graph, ML, …)
  • 35. EXAMPLE RUNNING AGAINST SPARK • https://github.com/jadianes/spark-movie-lens/blob/master/notebooks/building-recommender.ipynb
  • 36. WHAT IS ORACLE DOING AROUND MACHINE LEARNING? • Oracle Advanced Analytics in Oracle Database • Data Mining, Enterprise R • Text (ESA), Spatial, Graph • SQL
  • 37. DEMONSTRATION OF ORACLE ADVANCED ANALYTICS • Using Text Mining and Naives Bayes Data Mining Classification • Train model for classifying conference abstracts into tracks • Use model to propose a track for new abstracts • Steps • Gather data • Import, cleanse, enrich, … • Prepare training set and test set • Select and configure model • Combining Text and Mining using Naive Bayes • Train model • Test and apply model
  • 38. BIG DATA SQL ORACLE DATABASE AS SINGLE POINT OF ENTRY
  • 39. MANY CLOUD SERVICES AROUND BIG DATA & [PREDICTIVE] ANALYTICS & MACHINE LEARNING 39
  • 40. WHAT IS ORACLE DOING AROUND MACHINE LEARNING? • Big Data Discovery (fka Endeca) and BDD CS • Big Data Appliance • Data Vizualization Cloud • Analytics Clouds (Sales, Marketing, HCM) on top of SaaS • RTD – Real Time Decisions • DaaS • Oracle Labs (labs.oracle.com) • Machine Learning Research Group (link) • Machine Learning CS – “Oracle Notebook”
  • 45. HUMANS LEARNING MACHINE LEARNING: YOUR FIRST STEPS • Jupyter Notebooks and Python – tmpnb.org • HortonWorks Sandbox VM – Hadoop & Spark & Hive, Ambari • DataBricks Cloud Environment with Apache Spark (free trial) • Oracle Big Data Lite – Prebuilt Virtual Machine • Tutorials, Courses (Udacity, Coursera, edX) • Books • Introducing Data Science • Learning Apache Spark 2 • Python Machine Learning
  • 46. SUMMARY • Machine Learning by computers helps us(ers) understand historic data and apply that insight to new data • Through smart algorithms, advanced software and cheap and powerful compute and storage resources, Machine Learning is accessible to all • R and Python are most popular technologies for data exploration and ML model discovery • Apache Spark (on Hadoop) is frequently used to powercrunch data (wrangling) and run ML models on Big Data sets • Notebooks are a popular vehicle in the Data Science lab • To explore and report • Oracle researches, applies and exposes ML (Big Data SQL, OAA, OPC) • Getting started on Machine Learning is fun, smart and well supported
  • 47. • Blog: technology.amis.nl • Email: [email protected] • : lucasjellema • : lucas-jellema • : www.amis.nl, [email protected] +31 306016000 Edisonbaan 15, Nieuwegein