SlideShare a Scribd company logo
VP AIOps for the Autonomous Database
Sandesh Rao
From DBA’s to Data Scientists
Introduction to Machine Learning
@sandeshr
https://www.linkedin.com/in/raosandesh/
https://www.slideshare.net/SandeshRao4
Tasks Specific to Business and Innovation
• Architecture, planning, data modeling
• Data security and lifecycle management
• Application related tuning
• End-to-End service level management
Maintenance Tasks
• Configuration and tuning of systems, network, storage
• Database provisioning, patching
• Database backups, H/A, disaster recovery
• Database optimization
Traditionally DBAs are Responsible for:
Value Scale
Innovation
Maintenance
Tasks Specific to Business and Innovation
• Architecture, planning, data modeling
• Data security and lifecycle management
• Application related tuning
• End-to-End service level management
Maintenance Tasks
• Configuration and tuning of systems, network, storage
• Database provisioning, patching
• Database backups, H/A, disaster recovery
• Database optimization
Freedom from Drudgery for DBA: More Time to Innovate and Improve the Business
Autonomous Database Removes Generic Tasks
Value Scale
Innovation
Maintenance
Machine Learning
Solving data-driven
problems
Discovering insights
Making predictions
Data Security
Data classification,
Data life-cycle mgmt
Application Tuning
SQL tuning,
connection mgmt
The Evolution of the DBA/Database Developer Role
Data Engineer
Architecture,
“data wrangler”
Data extraction
Data wrangling
Deriving new attributes
(“feature engineering”)
…
…
…
Import predictions & insights
Translate and deploy ML models
Automate
You Are Probably Already Doing Most of This Work!
Database Developer to Data Scientist Journey
1 - https://www.infoworld.com/article/3228245/data-science/the-80-20-data-science-dilemma.html
Typically 80% of the work
Most data scientists spend only 20 percent of their time
on actual data analysis and 80 percent of their time
finding, cleaning, and reorganizing huge amounts of
data, which is an inefficient data strategy1
Eliminated or minimized with Oracle
Data Management platform becomes
combine/hybrid DM + machine learning platform
Albert Einstein
“If I had an hour to solve a
problem I'd spend 55
minutes thinking about the
problem and 5 minutes
thinking about solutions.”
Lots of Data needs to be crunched
• No time to manually sift through the data
Machine Learning has become accessible
• Software and algorithms are available
• Frameworks allow for massive training with no coding
• CI/CD available for MLOps
• It’s not the algorithm you need to know about !!
Business use cases
- Find the use cases for maximum impact
Why Machine Learning is important
Analytics Value vs. Maturity
Reports &
Dashboards
Data
Information
Predictions & Insights Appls with ML
Analytical Maturity
ValueofAnalytics
Diagnostic
Analysis &
Reports
Predictive /
Machine
Learning
“ML Enabled”
Applications
What Happened?
Why it Happened?
What WILL happen?
Automated ML Appls
Database Developer to Data Scientist Journey
ML Project Workflow
Set the business objectives
Gather compare and
clean data
Identify and extract features
(important columns) from imported data
This helps us identify the efficiency of
the algorithm
Take the input data which is also called the training
data and apply the algorithm to it
For the algorithm to function efficiently, it is
important to pick the right value for hyper parameters
(algorithm input parameters to the algorithm)
Once the training data in
the algorithm are
combined we get a model
1
2
3
4
5
Types of Machine Learning
Supervised Learning
Predict future outcomes with the help of
training data provided by human experts
Semi-Supervised Learning
Discover patterns within raw data and make
predictions, which are then reviewed by
human experts, who provide feedback which
is used to improve the model accuracy
Unsupervised Learning
Find patterns without any external input other
than the raw data
Reinforcement Learning
Take decisions based on past rewards for this
type of action
TIME SERIES
Temporal Aspect
Hitting a threshold
Forecasting energy use
Seasonality of data
Algorithms automatically sift through large amounts of data to discover
hidden patterns, new insights and make predictions
What is Machine Learning?
Identify most important factor (Attribute Importance)
Predict customer behavior (Classification)
Find profiles of targeted people or items (Classification
Predict or estimate a value (Regression)
Segment a population (Clustering)
Find fraudulent or “rare events” (Anomaly Detection)
Determine co-occurring items in a “basket” (Associations)X1
X2
A1A2A3A4 A5A6 A7
SupervisedLearningUnsupervisedLearning
Copyright © 2020 Oracle and/or its affiliates.
Machine Learning Algorithms
• Multiple Regression, Support Vector
Machine, Linear Model, LASSO, Random
Forest, Ridge Regression, Generalized
Linear Model, Stepwise Linear Regression
Regression
Association & Collaborative Filtering
Reinforcement Learning - brute force,
Monte Carlo, temporal difference....
• Many different use cases
Neural network & deep Learning with
Deep Neural Network
• Hierarchical k-means, Orthogonal
Partitioning Clustering, Expectation-
Maximization
Clustering
Feature Extraction/Attribute
Importance / Component Analysis
• Decision Tree, Naive Bayes, Random
Forest, Logistic Regression, Support
Vector Machine
Classification
What is Workload
Automatically
check
workload for
past x mins
Decide if
workload is
abnormally
high
Highlight any
abnormal
workload
issues
Optionally run on
demand
Optionally snooze
checking of a
component
Calculated via machine learning
Prediction (Every 5 minutes)
5 X 1 min metrics captured for
each dimension & ASH report
captured for later analysis
Metrics evaluated by the primary model to
determine if there are anomalies
If there is no primary model
(i.e. <7 days of data or <=95% model confidence)
then SME rules are used for anomaly detection
Each anomaly is compared against
the SME rules to determine which
dimension it applies to
Any anomalies are raised
along with recently
captured ASH report
Resource usage prediction
Configurable threshold
boundary – notify Admin of
forecasts above here
Actual values
(Black)
Forecast values
(Blue line)
Upper & lower
forecast range
(light blue area)
Unusual values
(anomalies)
Future forecast
values
Oracle Machine Learning
Key Features:
• Collaborative UI for data
scientist and analysts
• Packaged with Autonomous Databases
• Quick start Example notebooks
• Easy access to shared notebooks,
templates, permissions, scheduler, etc.
• OML4SQL
• OML4Py coming soon
• Supports deployment of OML models
Machine Learning Notebooks included in Autonomous Databases
Copyright © 2020 Oracle and/or its affiliates.
Simple SQL Syntax—Statistical Comparisons (t-tests)
Compare AVE Purchase Amounts Men vs. Women Grouped_By INCOME_LEVEL
Statistical Functions
SELECT SUBSTR(cust_income_level, 1, 22) income_level,
AVG(DECODE(cust_gender, 'M', amount_sold, null)) sold_to_men,
AVG(DECODE(cust_gender, 'F', amount_sold, null)) sold_to_women,
STATS_T_TEST_INDEPU(cust_gender, amount_sold, 'STATISTIC', 'F') t_observed,
STATS_T_TEST_INDEPU(cust_gender, amount_sold) two_sided_p_value
FROM customers c, sales s
WHERE c.cust_id = s.cust_id
GROUP BY ROLLUP(cust_income_level)
ORDER BY income_level, sold_to_men, sold_to_women, t_observed;
STATS_T_TEST_INDEPU (SQL) Example;
P_Values < 05 show statistically
significantly differences in the amounts
purchased by men vs. women
Simple SQL Syntax—Attribute Importance - ML Model Build (PL/SQL)
Model Build and Real-time SQL Apply Prediction
BEGIN
DBMS_DATA_MINING.CREATE_MODEL(
model_name => 'BUY_INSURANCE_AI',
mining_function => DBMS_DATA_MINING.ATTRIBUTE_IMPORTANCE,
data_table_name => 'CUST_INSUR_LTV',
case_id_column_name => 'cust_id',
target_column_name => 'BUY_INSURANCE',
settings_table_name => 'Att_Import_Mode_Settings');
END;
/
SELECT attribute_name, rank , attribute_value
FROM BUY_INSURANCE_AI
ORDER BY rank, attribute_name;
Model Results (SQL query)
ATTRIBUTE_NAME RANK ATTRIBUTE_VALUE
BANK_FUNDS 1 0.2161
MONEY_MONTLY_OVERDRAWN 2 0.1489
N_TRANS_ATM 3 0.1463
N_TRANS_TELLER 4 0.1156
T_AMOUNT_AUTOM_PAYMENTS 5 0.1095
A1A2A3A4 A5A6 A7
Multiple Languages UIs Supported for End Users & Apps Development
Oracle Machine Leaning
Application DevelopersDBAs
R & Python Data Scientists “Citizen” Data ScientistsNotebook Users & DS Teams
New! New!
Create New Derived Attributes or “Engineered Features”
Feature Engineering - examples
Source Attribute New Attribute/”Engineered Feature”
Date of Birth AGE
Address DISTANCE_TO_DESTINATION
COMMUTE_TIME
Call detail records (CDRs) #_DROPPED_CALLS
PERCENT_iNTERNATIONAL
Salary PERCENT_VS_PEERS
Purchases TOTALS_PER_CATEGORY (e.g. Food,
Clothing)
First, Identify the Key Attributes That Most Influence the Target Attribute
Modeling and Machine Learning
Attribute Importance Model
Next, Build Predictive Models to Predict Customers who are Likely to Have Good_Credit
Modeling and Machine Learning
Split Data into Train and Test
Build and Test Classification Model
Test the ML model’s accuracy
• Randomly selected “hold out” sample
of data that was used to train the ML
model
• Compute Cumulative Gains, Lift,
Accuracy, etc.
• Review the attributes used in the model
and model coefficients
• Make sure the model makes sense
Next, Build Predictive Models to Predict Customers who are Likely to Have Good_Credit
Model Evaluation (Machine Learning)
Model Evaluation
Simple SQL Apply scripts run 100% inside the
Database for immediate ML model
deployment
Apply the Models to Predict “Best Customers”
Deployment
Model Apply/”Scoring”
Coming Soon! | AutoML – new with OML4Py
Auto Feature Selection
– Reduce # of features by
identifying most
predictive
– Improve performance
and accuracy
Increase data scientist productivity – reduce overall compute time
Auto Algorithm
Selection
Much faster than
exhaustive search
Auto Feature
Selection
De-noise data and
reduce # of features
AutoTune
Significant accuracy
improvement
Auto Algorithm Selection
– Identify in-database
algorithm that achieves
highest model quality
– Find best algorithm faster
than with exhaustive search
Auto Tune Hyperparameters
– Significantly improve
model accuracy
– Avoid manual or exhaustive
search techniques
Copyright © 2020 Oracle and/or its affiliates.
Enables non-expert users to leverage Machine Learning
Data
Table ML Model
Coming Soon! | OML AutoML User Interface
Automate production and deployment of ML models
• Enhance Data Scientist productivity
and user-experience
• Enable non-expert users to leverage ML
• Unify model deployment and monitoring
• Support model management
Features
• Minimal user input: data, target
• Model leaderboard
• Model deployment via REST
“Code-free” user interface supporting automated end-to-end machine
learning
Copyright © 2020 Oracle and/or its affiliates.
Coming Soon! | Algorithms for Database 20c
Gradient Boosted Trees (XGBoost)
• Highly popular and powerful algorithm – Kaggle winners
• Classification, regression, ranking, survival analysis
MSET-SPRT
• Multivariate State Estimation Technique - Sequential
Probability Ratio Test (MSET-SPRT)
• Nonlinear, nonparametric anomaly detection
algorithm designed to monitor critical processes.
• Detects subtle anomalies while also producing
minimal false alarms.
Two major new ML algorithms
Copyright © 2020 Oracle and/or its affiliates.
Oracle Data Miner UI
Easy to use to define
analytical
methodologies that
can be shared
SQL Developer
Extension
Workflow API
and generates SQL
code for immediate
deployment
Drag and Drop, Workflows, Easy to Use UI for “Citizen Data Scientist”
Copyright © 2020 Oracle and/or its affiliates.
Congratulations!
Almost there J
Data Scientist
• Oracle Cloud Infrastructure Data Science
• AutoML
- Automated algorithm selection and tuning
- Automates the process of running tests against multiple algorithms and hyperparameter configurations
- Checks results for accuracy and confirms that the optimal model and configuration are selected for use.
- This saves significant time for data scientists
• Feature Selection
- Automated predictive feature selection simplifies feature engineering by automatically identifying key
predictive features from larger datasets.
• Model Evaluation
- Measure model performance against new data,
- Rank models over time to enable optimal behavior in production
• Model Explanation
- Explanation of the relative weighting and importance of the factors that go into generating a prediction
Oracle Cloud Data Science Platform
• Oracle Cloud Infrastructure Data Science
• Notebook Sessions
- Built-in cloud-hosted JupyterLab notebook sessions enable teams to build and train models
using Python.
• Visualization Tools
- Use popular open source visualization tools like plotly, matplotlib, and bokeh to visualize and
explore data.
• Open Source Machine Learning Frameworks
- Launch notebook sessions with popular machine learning frameworks like TensorFlow, Jupyter,
Dask, Keras, XGboost, and scikit-learn, or bring your own packages.
Oracle Cloud Data Science Platform
Algorithms Operate on Data
ML and AI are just “Algorithms”
Move the Algorithms; Not the Data!;
It Changes Everything!
Thank You
Any Questions ?
Sandesh Rao
VP AIOps for the Autonomous Database
@sandeshr
https://www.linkedin.com/in/raosandesh/
https://www.slideshare.net/SandeshRao4

More Related Content

What's hot (20)

PPTX
Oracle Database House Party_Oracle Machine Learning to Pick a Good Inexpensiv...
Charlie Berger
 
PDF
Oracle RAC 19c and Later - Best Practices #OOWLON
Markus Michalewicz
 
PPTX
DBCS Office Hours - Modernization through Migration
Tammy Bednar
 
PPTX
Biwa summit 2015 oaa oracle data miner hands on lab
Charlie Berger
 
PDF
The Top 5 Reasons to Deploy Your Applications on Oracle RAC
Markus Michalewicz
 
PDF
The Machine Learning behind the Autonomous Database- EMEA Tour Oct 2019
Sandesh Rao
 
PPTX
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...
Charlie Berger
 
PDF
Machine Learning and AI at Oracle
Sandesh Rao
 
PDF
20 tips and tricks with the Autonomous Database
Sandesh Rao
 
PDF
Under the Hood of the Smartest Availability Features in Oracle's Autonomous D...
Markus Michalewicz
 
PDF
(Oracle) DBA and Other Skills Needed in 2020
Markus Michalewicz
 
PDF
ORAchk EXAchk what's new in 12.1.0.2.7
Sandesh Rao
 
PDF
Database Cloud Services Office Hours : Oracle sharding hyperscale globally d...
Tammy Bednar
 
PDF
Make Your Application “Oracle RAC Ready” & Test For It
Markus Michalewicz
 
PPTX
#dbhouseparty - Using Oracle’s Converged “AI” Database to Pick a Good but Ine...
Tammy Bednar
 
PPTX
The Art of Intelligence – Introduction Machine Learning for Oracle profession...
Lucas Jellema
 
PDF
AIOUG -GroundBreakers-Jul 2019 - Introduction to Machine Learning - From DBA'...
Sandesh Rao
 
PDF
How to use Exachk effectively to manage Exadata environments OGBEmea
Sandesh Rao
 
PDF
Under the Hood of the Smartest Availability Features in Oracle's Autonomous D...
Markus Michalewicz
 
PDF
Why Use an Oracle Database?
Markus Michalewicz
 
Oracle Database House Party_Oracle Machine Learning to Pick a Good Inexpensiv...
Charlie Berger
 
Oracle RAC 19c and Later - Best Practices #OOWLON
Markus Michalewicz
 
DBCS Office Hours - Modernization through Migration
Tammy Bednar
 
Biwa summit 2015 oaa oracle data miner hands on lab
Charlie Berger
 
The Top 5 Reasons to Deploy Your Applications on Oracle RAC
Markus Michalewicz
 
The Machine Learning behind the Autonomous Database- EMEA Tour Oct 2019
Sandesh Rao
 
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...
Charlie Berger
 
Machine Learning and AI at Oracle
Sandesh Rao
 
20 tips and tricks with the Autonomous Database
Sandesh Rao
 
Under the Hood of the Smartest Availability Features in Oracle's Autonomous D...
Markus Michalewicz
 
(Oracle) DBA and Other Skills Needed in 2020
Markus Michalewicz
 
ORAchk EXAchk what's new in 12.1.0.2.7
Sandesh Rao
 
Database Cloud Services Office Hours : Oracle sharding hyperscale globally d...
Tammy Bednar
 
Make Your Application “Oracle RAC Ready” & Test For It
Markus Michalewicz
 
#dbhouseparty - Using Oracle’s Converged “AI” Database to Pick a Good but Ine...
Tammy Bednar
 
The Art of Intelligence – Introduction Machine Learning for Oracle profession...
Lucas Jellema
 
AIOUG -GroundBreakers-Jul 2019 - Introduction to Machine Learning - From DBA'...
Sandesh Rao
 
How to use Exachk effectively to manage Exadata environments OGBEmea
Sandesh Rao
 
Under the Hood of the Smartest Availability Features in Oracle's Autonomous D...
Markus Michalewicz
 
Why Use an Oracle Database?
Markus Michalewicz
 

Similar to Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA (20)

PDF
Introduction to Machine Learning and Data Science using the Autonomous databa...
Sandesh Rao
 
PDF
Azure Machine Learning
Mostafa
 
PPTX
Machine learning
Saravanan Subburayal
 
PPTX
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
Value Amplify Consulting
 
PDF
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
Sandesh Rao
 
PDF
AI for Software Engineering
Miroslaw Staron
 
PPTX
Getting Started with Azure AutoML
Vivek Raja P S
 
PDF
Understanding DataOps and Its Impact on Application Quality
DevOps.com
 
PPTX
AI-ML-Virtual-Internship on new technology
AnubhavKumar615216
 
PDF
Scaling AutoML-Driven Anomaly Detection With Luminaire
Databricks
 
PDF
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Dataconomy Media
 
PDF
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
DATAVERSITY
 
PPTX
Automated machine learning - Global AI night 2019
Marco Zamana
 
PPTX
Deep learning
Arun Shukla
 
PDF
Machine Learning Classifiers
Mostafa
 
PPTX
Intro to ML for product school meetup
Erez Shilon
 
PDF
Beyond the Launch Scaling and Maintaining Your AI System for.pdf
himanshuwowit
 
PPTX
DataScience-101
Karthikeyan VK
 
PPTX
Python for Machine Learning_ A Comprehensive Overview.pptx
KuldeepSinghBrar3
 
PDF
Barga Data Science lecture 2
Roger Barga
 
Introduction to Machine Learning and Data Science using the Autonomous databa...
Sandesh Rao
 
Azure Machine Learning
Mostafa
 
Machine learning
Saravanan Subburayal
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
Value Amplify Consulting
 
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
Sandesh Rao
 
AI for Software Engineering
Miroslaw Staron
 
Getting Started with Azure AutoML
Vivek Raja P S
 
Understanding DataOps and Its Impact on Application Quality
DevOps.com
 
AI-ML-Virtual-Internship on new technology
AnubhavKumar615216
 
Scaling AutoML-Driven Anomaly Detection With Luminaire
Databricks
 
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Dataconomy Media
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
DATAVERSITY
 
Automated machine learning - Global AI night 2019
Marco Zamana
 
Deep learning
Arun Shukla
 
Machine Learning Classifiers
Mostafa
 
Intro to ML for product school meetup
Erez Shilon
 
Beyond the Launch Scaling and Maintaining Your AI System for.pdf
himanshuwowit
 
DataScience-101
Karthikeyan VK
 
Python for Machine Learning_ A Comprehensive Overview.pptx
KuldeepSinghBrar3
 
Barga Data Science lecture 2
Roger Barga
 
Ad

More from Sandesh Rao (20)

PDF
Will Oracle 23ai make you a better DBA or Developer?
Sandesh Rao
 
PDF
Beyond Metrics – Oracle AHF Insights for Proactive Database Management - DOAG...
Sandesh Rao
 
PDF
Sandesh_Rao_Navigating Oracle Troubleshooting- AHF Insights for Database 23ai...
Sandesh Rao
 
PDF
Sandesh_Rao_Unlocking Oracle Database Mysteries AHF Insights and the AI-LLM D...
Sandesh Rao
 
PDF
Whats new in Autonomous Database in 2022
Sandesh Rao
 
PDF
Oracle Database performance tuning using oratop
Sandesh Rao
 
PDF
Analysis of Database Issues using AHF and Machine Learning v2 - AOUG2022
Sandesh Rao
 
PDF
Analysis of Database Issues using AHF and Machine Learning v2 - SOUG
Sandesh Rao
 
PDF
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
Sandesh Rao
 
PDF
15 Troubleshooting tips and Tricks for Database 21c - KSAOUG
Sandesh Rao
 
PDF
How to Use EXAchk Effectively to Manage Exadata Environments
Sandesh Rao
 
PDF
15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG
Sandesh Rao
 
PDF
TFA Collector - what can one do with it
Sandesh Rao
 
PDF
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Sandesh Rao
 
PDF
Troubleshooting tips and tricks for Oracle Database Oct 2020
Sandesh Rao
 
PDF
TFA, ORAchk and EXAchk 20.2 - What's new
Sandesh Rao
 
PDF
Oracle Autonomous Health Service- For Protecting Your On-Premise Databases- F...
Sandesh Rao
 
PDF
The Machine Learning behind the Autonomous Database ILOUG Feb 2020
Sandesh Rao
 
PDF
Troubleshooting Tips and Tricks for Database 19c ILOUG Feb 2020
Sandesh Rao
 
PDF
Troubleshooting Tips and Tricks for Database 19c - Sangam 2019
Sandesh Rao
 
Will Oracle 23ai make you a better DBA or Developer?
Sandesh Rao
 
Beyond Metrics – Oracle AHF Insights for Proactive Database Management - DOAG...
Sandesh Rao
 
Sandesh_Rao_Navigating Oracle Troubleshooting- AHF Insights for Database 23ai...
Sandesh Rao
 
Sandesh_Rao_Unlocking Oracle Database Mysteries AHF Insights and the AI-LLM D...
Sandesh Rao
 
Whats new in Autonomous Database in 2022
Sandesh Rao
 
Oracle Database performance tuning using oratop
Sandesh Rao
 
Analysis of Database Issues using AHF and Machine Learning v2 - AOUG2022
Sandesh Rao
 
Analysis of Database Issues using AHF and Machine Learning v2 - SOUG
Sandesh Rao
 
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
Sandesh Rao
 
15 Troubleshooting tips and Tricks for Database 21c - KSAOUG
Sandesh Rao
 
How to Use EXAchk Effectively to Manage Exadata Environments
Sandesh Rao
 
15 Troubleshooting Tips and Tricks for database 21c - OGBEMEA KSAOUG
Sandesh Rao
 
TFA Collector - what can one do with it
Sandesh Rao
 
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Sandesh Rao
 
Troubleshooting tips and tricks for Oracle Database Oct 2020
Sandesh Rao
 
TFA, ORAchk and EXAchk 20.2 - What's new
Sandesh Rao
 
Oracle Autonomous Health Service- For Protecting Your On-Premise Databases- F...
Sandesh Rao
 
The Machine Learning behind the Autonomous Database ILOUG Feb 2020
Sandesh Rao
 
Troubleshooting Tips and Tricks for Database 19c ILOUG Feb 2020
Sandesh Rao
 
Troubleshooting Tips and Tricks for Database 19c - Sangam 2019
Sandesh Rao
 
Ad

Recently uploaded (20)

PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Digital Circuits, important subject in CS
contactparinay1
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 

Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA

  • 1. VP AIOps for the Autonomous Database Sandesh Rao From DBA’s to Data Scientists Introduction to Machine Learning @sandeshr https://www.linkedin.com/in/raosandesh/ https://www.slideshare.net/SandeshRao4
  • 2. Tasks Specific to Business and Innovation • Architecture, planning, data modeling • Data security and lifecycle management • Application related tuning • End-to-End service level management Maintenance Tasks • Configuration and tuning of systems, network, storage • Database provisioning, patching • Database backups, H/A, disaster recovery • Database optimization Traditionally DBAs are Responsible for: Value Scale Innovation Maintenance
  • 3. Tasks Specific to Business and Innovation • Architecture, planning, data modeling • Data security and lifecycle management • Application related tuning • End-to-End service level management Maintenance Tasks • Configuration and tuning of systems, network, storage • Database provisioning, patching • Database backups, H/A, disaster recovery • Database optimization Freedom from Drudgery for DBA: More Time to Innovate and Improve the Business Autonomous Database Removes Generic Tasks Value Scale Innovation Maintenance
  • 4. Machine Learning Solving data-driven problems Discovering insights Making predictions Data Security Data classification, Data life-cycle mgmt Application Tuning SQL tuning, connection mgmt The Evolution of the DBA/Database Developer Role Data Engineer Architecture, “data wrangler”
  • 5. Data extraction Data wrangling Deriving new attributes (“feature engineering”) … … … Import predictions & insights Translate and deploy ML models Automate You Are Probably Already Doing Most of This Work! Database Developer to Data Scientist Journey 1 - https://www.infoworld.com/article/3228245/data-science/the-80-20-data-science-dilemma.html Typically 80% of the work Most data scientists spend only 20 percent of their time on actual data analysis and 80 percent of their time finding, cleaning, and reorganizing huge amounts of data, which is an inefficient data strategy1 Eliminated or minimized with Oracle Data Management platform becomes combine/hybrid DM + machine learning platform
  • 6. Albert Einstein “If I had an hour to solve a problem I'd spend 55 minutes thinking about the problem and 5 minutes thinking about solutions.”
  • 7. Lots of Data needs to be crunched • No time to manually sift through the data Machine Learning has become accessible • Software and algorithms are available • Frameworks allow for massive training with no coding • CI/CD available for MLOps • It’s not the algorithm you need to know about !! Business use cases - Find the use cases for maximum impact Why Machine Learning is important
  • 8. Analytics Value vs. Maturity Reports & Dashboards Data Information Predictions & Insights Appls with ML Analytical Maturity ValueofAnalytics Diagnostic Analysis & Reports Predictive / Machine Learning “ML Enabled” Applications What Happened? Why it Happened? What WILL happen? Automated ML Appls
  • 9. Database Developer to Data Scientist Journey
  • 10. ML Project Workflow Set the business objectives Gather compare and clean data Identify and extract features (important columns) from imported data This helps us identify the efficiency of the algorithm Take the input data which is also called the training data and apply the algorithm to it For the algorithm to function efficiently, it is important to pick the right value for hyper parameters (algorithm input parameters to the algorithm) Once the training data in the algorithm are combined we get a model 1 2 3 4 5
  • 11. Types of Machine Learning Supervised Learning Predict future outcomes with the help of training data provided by human experts Semi-Supervised Learning Discover patterns within raw data and make predictions, which are then reviewed by human experts, who provide feedback which is used to improve the model accuracy Unsupervised Learning Find patterns without any external input other than the raw data Reinforcement Learning Take decisions based on past rewards for this type of action
  • 12. TIME SERIES Temporal Aspect Hitting a threshold Forecasting energy use Seasonality of data
  • 13. Algorithms automatically sift through large amounts of data to discover hidden patterns, new insights and make predictions What is Machine Learning? Identify most important factor (Attribute Importance) Predict customer behavior (Classification) Find profiles of targeted people or items (Classification Predict or estimate a value (Regression) Segment a population (Clustering) Find fraudulent or “rare events” (Anomaly Detection) Determine co-occurring items in a “basket” (Associations)X1 X2 A1A2A3A4 A5A6 A7 SupervisedLearningUnsupervisedLearning Copyright © 2020 Oracle and/or its affiliates.
  • 14. Machine Learning Algorithms • Multiple Regression, Support Vector Machine, Linear Model, LASSO, Random Forest, Ridge Regression, Generalized Linear Model, Stepwise Linear Regression Regression Association & Collaborative Filtering Reinforcement Learning - brute force, Monte Carlo, temporal difference.... • Many different use cases Neural network & deep Learning with Deep Neural Network • Hierarchical k-means, Orthogonal Partitioning Clustering, Expectation- Maximization Clustering Feature Extraction/Attribute Importance / Component Analysis • Decision Tree, Naive Bayes, Random Forest, Logistic Regression, Support Vector Machine Classification
  • 15. What is Workload Automatically check workload for past x mins Decide if workload is abnormally high Highlight any abnormal workload issues Optionally run on demand Optionally snooze checking of a component Calculated via machine learning
  • 16. Prediction (Every 5 minutes) 5 X 1 min metrics captured for each dimension & ASH report captured for later analysis Metrics evaluated by the primary model to determine if there are anomalies If there is no primary model (i.e. <7 days of data or <=95% model confidence) then SME rules are used for anomaly detection Each anomaly is compared against the SME rules to determine which dimension it applies to Any anomalies are raised along with recently captured ASH report
  • 17. Resource usage prediction Configurable threshold boundary – notify Admin of forecasts above here Actual values (Black) Forecast values (Blue line) Upper & lower forecast range (light blue area) Unusual values (anomalies) Future forecast values
  • 18. Oracle Machine Learning Key Features: • Collaborative UI for data scientist and analysts • Packaged with Autonomous Databases • Quick start Example notebooks • Easy access to shared notebooks, templates, permissions, scheduler, etc. • OML4SQL • OML4Py coming soon • Supports deployment of OML models Machine Learning Notebooks included in Autonomous Databases Copyright © 2020 Oracle and/or its affiliates.
  • 19. Simple SQL Syntax—Statistical Comparisons (t-tests) Compare AVE Purchase Amounts Men vs. Women Grouped_By INCOME_LEVEL Statistical Functions SELECT SUBSTR(cust_income_level, 1, 22) income_level, AVG(DECODE(cust_gender, 'M', amount_sold, null)) sold_to_men, AVG(DECODE(cust_gender, 'F', amount_sold, null)) sold_to_women, STATS_T_TEST_INDEPU(cust_gender, amount_sold, 'STATISTIC', 'F') t_observed, STATS_T_TEST_INDEPU(cust_gender, amount_sold) two_sided_p_value FROM customers c, sales s WHERE c.cust_id = s.cust_id GROUP BY ROLLUP(cust_income_level) ORDER BY income_level, sold_to_men, sold_to_women, t_observed; STATS_T_TEST_INDEPU (SQL) Example; P_Values < 05 show statistically significantly differences in the amounts purchased by men vs. women
  • 20. Simple SQL Syntax—Attribute Importance - ML Model Build (PL/SQL) Model Build and Real-time SQL Apply Prediction BEGIN DBMS_DATA_MINING.CREATE_MODEL( model_name => 'BUY_INSURANCE_AI', mining_function => DBMS_DATA_MINING.ATTRIBUTE_IMPORTANCE, data_table_name => 'CUST_INSUR_LTV', case_id_column_name => 'cust_id', target_column_name => 'BUY_INSURANCE', settings_table_name => 'Att_Import_Mode_Settings'); END; / SELECT attribute_name, rank , attribute_value FROM BUY_INSURANCE_AI ORDER BY rank, attribute_name; Model Results (SQL query) ATTRIBUTE_NAME RANK ATTRIBUTE_VALUE BANK_FUNDS 1 0.2161 MONEY_MONTLY_OVERDRAWN 2 0.1489 N_TRANS_ATM 3 0.1463 N_TRANS_TELLER 4 0.1156 T_AMOUNT_AUTOM_PAYMENTS 5 0.1095 A1A2A3A4 A5A6 A7
  • 21. Multiple Languages UIs Supported for End Users & Apps Development Oracle Machine Leaning Application DevelopersDBAs R & Python Data Scientists “Citizen” Data ScientistsNotebook Users & DS Teams New! New!
  • 22. Create New Derived Attributes or “Engineered Features” Feature Engineering - examples Source Attribute New Attribute/”Engineered Feature” Date of Birth AGE Address DISTANCE_TO_DESTINATION COMMUTE_TIME Call detail records (CDRs) #_DROPPED_CALLS PERCENT_iNTERNATIONAL Salary PERCENT_VS_PEERS Purchases TOTALS_PER_CATEGORY (e.g. Food, Clothing)
  • 23. First, Identify the Key Attributes That Most Influence the Target Attribute Modeling and Machine Learning Attribute Importance Model
  • 24. Next, Build Predictive Models to Predict Customers who are Likely to Have Good_Credit Modeling and Machine Learning Split Data into Train and Test Build and Test Classification Model
  • 25. Test the ML model’s accuracy • Randomly selected “hold out” sample of data that was used to train the ML model • Compute Cumulative Gains, Lift, Accuracy, etc. • Review the attributes used in the model and model coefficients • Make sure the model makes sense Next, Build Predictive Models to Predict Customers who are Likely to Have Good_Credit Model Evaluation (Machine Learning) Model Evaluation
  • 26. Simple SQL Apply scripts run 100% inside the Database for immediate ML model deployment Apply the Models to Predict “Best Customers” Deployment Model Apply/”Scoring”
  • 27. Coming Soon! | AutoML – new with OML4Py Auto Feature Selection – Reduce # of features by identifying most predictive – Improve performance and accuracy Increase data scientist productivity – reduce overall compute time Auto Algorithm Selection Much faster than exhaustive search Auto Feature Selection De-noise data and reduce # of features AutoTune Significant accuracy improvement Auto Algorithm Selection – Identify in-database algorithm that achieves highest model quality – Find best algorithm faster than with exhaustive search Auto Tune Hyperparameters – Significantly improve model accuracy – Avoid manual or exhaustive search techniques Copyright © 2020 Oracle and/or its affiliates. Enables non-expert users to leverage Machine Learning Data Table ML Model
  • 28. Coming Soon! | OML AutoML User Interface Automate production and deployment of ML models • Enhance Data Scientist productivity and user-experience • Enable non-expert users to leverage ML • Unify model deployment and monitoring • Support model management Features • Minimal user input: data, target • Model leaderboard • Model deployment via REST “Code-free” user interface supporting automated end-to-end machine learning Copyright © 2020 Oracle and/or its affiliates.
  • 29. Coming Soon! | Algorithms for Database 20c Gradient Boosted Trees (XGBoost) • Highly popular and powerful algorithm – Kaggle winners • Classification, regression, ranking, survival analysis MSET-SPRT • Multivariate State Estimation Technique - Sequential Probability Ratio Test (MSET-SPRT) • Nonlinear, nonparametric anomaly detection algorithm designed to monitor critical processes. • Detects subtle anomalies while also producing minimal false alarms. Two major new ML algorithms Copyright © 2020 Oracle and/or its affiliates.
  • 30. Oracle Data Miner UI Easy to use to define analytical methodologies that can be shared SQL Developer Extension Workflow API and generates SQL code for immediate deployment Drag and Drop, Workflows, Easy to Use UI for “Citizen Data Scientist” Copyright © 2020 Oracle and/or its affiliates.
  • 32. • Oracle Cloud Infrastructure Data Science • AutoML - Automated algorithm selection and tuning - Automates the process of running tests against multiple algorithms and hyperparameter configurations - Checks results for accuracy and confirms that the optimal model and configuration are selected for use. - This saves significant time for data scientists • Feature Selection - Automated predictive feature selection simplifies feature engineering by automatically identifying key predictive features from larger datasets. • Model Evaluation - Measure model performance against new data, - Rank models over time to enable optimal behavior in production • Model Explanation - Explanation of the relative weighting and importance of the factors that go into generating a prediction Oracle Cloud Data Science Platform
  • 33. • Oracle Cloud Infrastructure Data Science • Notebook Sessions - Built-in cloud-hosted JupyterLab notebook sessions enable teams to build and train models using Python. • Visualization Tools - Use popular open source visualization tools like plotly, matplotlib, and bokeh to visualize and explore data. • Open Source Machine Learning Frameworks - Launch notebook sessions with popular machine learning frameworks like TensorFlow, Jupyter, Dask, Keras, XGboost, and scikit-learn, or bring your own packages. Oracle Cloud Data Science Platform
  • 34. Algorithms Operate on Data ML and AI are just “Algorithms” Move the Algorithms; Not the Data!; It Changes Everything!
  • 35. Thank You Any Questions ? Sandesh Rao VP AIOps for the Autonomous Database @sandeshr https://www.linkedin.com/in/raosandesh/ https://www.slideshare.net/SandeshRao4