SlideShare a Scribd company logo
© 2018 KNIME AG. All Right Reserved.
From Raw Data to Deployment
KNIMEr: rosaria.silipo@knime.com
KNIMEr: maarit.widmann@knime.com
KNIMEr: jerome.treboux@hevs.ch
@KNIME
© 2018 KNIME AG. All Rights Reserved.
Do you recognize this?
2
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
© 2018 KNIME AG. All Rights Reserved.
Let’s unroll it!
It always starts
with some data …
3
Data
Preparation
Model
Training
Model
Optimization
Deployment
Data Manipulation
Data Blending
Missing Values Handling
Feature Generation
Dimensionality Reduction
Feature Selection
Outlier Removal
Normalization
Partitioning
…
Model Training
Bag of Models
Model Selection
Ensemble Models
Own Ensemble Model
External Models
Import Existing Models
Model Factory
…
Parameter Tuning
Parameter Optimization
Regularization
Model Size
No. Iterations
…
Performance Measures
Accuracy
ROC Curve
Cross-Validation
…
Files & DBs
Dashboards
REST API
SQL Code Export
Reporting
…
Model
Evaluation
© 2018 KNIME AG. All Rights Reserved.
The many Lives of a Dataset
4
Data
Preparation
Model
Training
Model
Optimization
Model
Evaluation
Deployment
Partitioning:
• Training Set
• Validation Set
• Test Set
Training Set Validation Set Test Set New Data from Real
World Applications
Original Data
Set with Past
Observations
© 2018 KNIME AG. All Rights Reserved.
Data Exploration
• Sometimes in between Data Access and Data
Preparation there is a Data Exploration phase
• The Data Exploration phase is useful to get to
know the data
• KNIME offers a few visualization nodes to build
dashboards to explore the data
5
© 2018 KNIME AG. All Rights Reserved.
What about Big Data?
• Big Data serves Scalability
• The whole Analytics Process is no different on
Big Data
• You need:
– a Big Data Platform
– The KNIME Big Data (Spark & Hive) Extension
6
© 2018 KNIME AG. All Rights Reserved.
One Example for Every Need
The KNIME EXAMPLES Server
7
50_Applications
© 2018 KNIME AG. All Rights Reserved.
Classification Problem & Data Set
• Airline Dataset: http://stat-computing.org/dataexpo/2009/the-data.html
• Smaller dataset (Jan 2007) (AirlineDataset.table)
• Challenge:
Predict Departure Delays
If on original airline dataset, only flights from airport ORD
Output Class = “delay” if depdelay > 15min
otherwise “no delay”
Available features: date, dep time, arr time, carrier, destination, cancelled, …
8
© 2018 KNIME AG. All Rights Reserved.
Challenges
• Group 1. Data Access and Data Preparation
• Group 2. ML Model Training
• Group 3. Model Deployment
• Import file Learnathon_2018.knar into your workspace
9
© 2018 KNIME AG. All Rights Reserved.
Group 1. Data Access and Data Preparation
10
© 2018 KNIME AG. All Rights Reserved.
Group 2. Model Training & Optimization
11
© 2018 KNIME AG. All Rights Reserved.
Group 3. Deployment
12
• Deployment Options – Multiple challenges:
– Workflow deployment to KNIME Server
– Remote/Scheduled execution from KNIME
Server
– KNIME RESTful Web Services
– Build a Composite Interactive Dashboard and
make it available on KNIME Web Portal
– Generate a report with BIRT
– Write Prediction Results to a Database
© 2018 KNIME AG. All Rights Reserved.
KNIME Spring Summit 2019
March 18 – 22 at bcc Berlin Congress Center, Berlin
• Monday & Tuesday: One-day courses
• Wednesday & Thursday: Summit sessions
• Friday: Workshops
Use the code
LEARNATHON
for 10% off tickets!
Register at
knime.com/spring-summit2019
© 2018 KNIME AG. All Rights Reserved.
KNIME Beginner’s Luck
Free Copy of KNIME Beginner’s Luck Book from KNIME Press
https://www.knime.com/knimepress
with this code: PARIS-1118
15
© 2018 KNIME AG. All Rights Reserved.
You can find KNIMers here!
16
• KNIME (www.knime.com)
• BLOG for news, tips and tricks(www.knime.com/blog)
• FORUM for questions and answers (tech.knime.com/forum)
• EXAMPLE SERVER for example workflows
• LEARNING HUB (www.knime.com/learning-hub)
• KNIME TV channel on
• KNIME on @KNIME
• KNIME on https://www.facebook.com/KNIMEanalytics
• On
© 2018 KNIME AG. All Rights Reserved. 17
The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH,
and are registered in the United States. KNIME® is also registered in Germany.
Thank You!

More Related Content

What's hot (20)

PDF
Heterogeneous Data Mining with Spark
KNIMESlides
 
PDF
Sharing and Deploying Data Science with KNIME Server
KNIMESlides
 
PDF
Chemistry Data Basics with KNIME Analytics Platform
KNIMESlides
 
PPTX
From Raw Data to Deployment
KNIMESlides
 
PPTX
Machine learning basic course with KNIME analytics platform
Nathaniel Shimoni
 
PDF
Just add Imagination
KNIMESlides
 
PDF
What's New in KNIME Analytics Platform 4.1
KNIMESlides
 
PDF
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
KNIMESlides
 
PDF
Advanced analytics for the Internet of Things. Restocking Rental Bike Stations
KNIMESlides
 
PDF
Codeless Deep Learning for Language Modeling and Image Classification
KNIMESlides
 
PDF
Automating Inferences out of Financial Data
KNIMESlides
 
PDF
AWS reInvent 2019 Trip Report
Craig Milroy
 
PDF
Twitter analytics in Bluemix
Wilfried Hoge
 
PDF
Is it harder to find a taxi when it is raining?
Wilfried Hoge
 
PDF
Daho.am meetup kubernetes evolution @abi
Ovidiu Hutuleac
 
PDF
The Race To Better Datacenters - Tailormade Colocation by Globalways AG
Markus Binder
 
PDF
#AI + #Cloud = #DigitalTransformation
Craig Milroy
 
PDF
Big Data LDN 2017: Your flight is boarding now!
Matt Stubbs
 
PDF
Flink Forward Berlin 2018: Henri Heiskanen - "How to keep our flock happy wit...
Flink Forward
 
PDF
Flink Forward Berlin 2018: Tobias Lindener - "Approximate standing queries on...
Flink Forward
 
Heterogeneous Data Mining with Spark
KNIMESlides
 
Sharing and Deploying Data Science with KNIME Server
KNIMESlides
 
Chemistry Data Basics with KNIME Analytics Platform
KNIMESlides
 
From Raw Data to Deployment
KNIMESlides
 
Machine learning basic course with KNIME analytics platform
Nathaniel Shimoni
 
Just add Imagination
KNIMESlides
 
What's New in KNIME Analytics Platform 4.1
KNIMESlides
 
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
KNIMESlides
 
Advanced analytics for the Internet of Things. Restocking Rental Bike Stations
KNIMESlides
 
Codeless Deep Learning for Language Modeling and Image Classification
KNIMESlides
 
Automating Inferences out of Financial Data
KNIMESlides
 
AWS reInvent 2019 Trip Report
Craig Milroy
 
Twitter analytics in Bluemix
Wilfried Hoge
 
Is it harder to find a taxi when it is raining?
Wilfried Hoge
 
Daho.am meetup kubernetes evolution @abi
Ovidiu Hutuleac
 
The Race To Better Datacenters - Tailormade Colocation by Globalways AG
Markus Binder
 
#AI + #Cloud = #DigitalTransformation
Craig Milroy
 
Big Data LDN 2017: Your flight is boarding now!
Matt Stubbs
 
Flink Forward Berlin 2018: Henri Heiskanen - "How to keep our flock happy wit...
Flink Forward
 
Flink Forward Berlin 2018: Tobias Lindener - "Approximate standing queries on...
Flink Forward
 

Similar to KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November 2018 (20)

PDF
From raw data to deployment
KNIMESlides
 
PPTX
Building an AI and ML Model Using KNIME and Python.pptx
ssuser448ad3
 
PDF
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Greg Landrum
 
PDF
Your Flight is Boarding Now!
MeetupDataScienceRoma
 
PDF
KNIME Software Overview
KNIMESlides
 
PDF
Citizen Data Science Training using KNIME
Ali Raza Anjum
 
PDF
Big Data with KNIME.pdf
James Vp
 
PDF
Let’s talk about reproducible data analysis
Greg Landrum
 
PDF
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
KNIMESlides
 
PDF
KNIME For Data Analytics Course Overview
BakhtiarAmaludin
 
PDF
Knime & bioinformatics
BioinformaticsInstitute
 
PPTX
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
Sri Ambati
 
PPTX
KNIME Data Connect - 5th December 2024 (Arief).pptx
DwiCahya58
 
PDF
Production-Ready BIG ML Workflows - from zero to hero
Daniel Marcous
 
PDF
Machine learning systems for engineers
Cameron Joannidis
 
PDF
From_SPSS Modeler_to_KNIME_v4.7_ebook.pdf
VeniAgustina1
 
PDF
Before Kaggle : from a business goal to a Machine Learning problem
Dataiku
 
PDF
Before Kaggle
Pierre Gutierrez
 
PDF
Beat the Benchmark.
Pruthuvi Maheshakya Wijewardena
 
PDF
Beat the Benchmark.
Pruthuvi Maheshakya Wijewardena
 
From raw data to deployment
KNIMESlides
 
Building an AI and ML Model Using KNIME and Python.pptx
ssuser448ad3
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Greg Landrum
 
Your Flight is Boarding Now!
MeetupDataScienceRoma
 
KNIME Software Overview
KNIMESlides
 
Citizen Data Science Training using KNIME
Ali Raza Anjum
 
Big Data with KNIME.pdf
James Vp
 
Let’s talk about reproducible data analysis
Greg Landrum
 
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
KNIMESlides
 
KNIME For Data Analytics Course Overview
BakhtiarAmaludin
 
Knime & bioinformatics
BioinformaticsInstitute
 
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
Sri Ambati
 
KNIME Data Connect - 5th December 2024 (Arief).pptx
DwiCahya58
 
Production-Ready BIG ML Workflows - from zero to hero
Daniel Marcous
 
Machine learning systems for engineers
Cameron Joannidis
 
From_SPSS Modeler_to_KNIME_v4.7_ebook.pdf
VeniAgustina1
 
Before Kaggle : from a business goal to a Machine Learning problem
Dataiku
 
Before Kaggle
Pierre Gutierrez
 
Beat the Benchmark.
Pruthuvi Maheshakya Wijewardena
 
Beat the Benchmark.
Pruthuvi Maheshakya Wijewardena
 
Ad

More from KNIMESlides (6)

PDF
Credit Card Fraud Detection Tutorial
KNIMESlides
 
PDF
Practicing Data Science: A Collection of Case Studies
KNIMESlides
 
PDF
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
KNIMESlides
 
PDF
Knime customer intelligence on social media: Text Analytics vs. Network Mining
KNIMESlides
 
PDF
Text Processing with KNIME
KNIMESlides
 
PDF
Big Data with KNIME is as easy as 1, 2, 3, ...4!
KNIMESlides
 
Credit Card Fraud Detection Tutorial
KNIMESlides
 
Practicing Data Science: A Collection of Case Studies
KNIMESlides
 
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
KNIMESlides
 
Knime customer intelligence on social media: Text Analytics vs. Network Mining
KNIMESlides
 
Text Processing with KNIME
KNIMESlides
 
Big Data with KNIME is as easy as 1, 2, 3, ...4!
KNIMESlides
 
Ad

Recently uploaded (20)

PDF
Why is partnering with a SaaS development company crucial for enterprise succ...
Nextbrain Technologies
 
PDF
Ready Layer One: Intro to the Model Context Protocol
mmckenna1
 
PDF
Salesforce Experience Cloud Consultant.pdf
VALiNTRY360
 
PPTX
Transforming Insights: How Generative AI is Revolutionizing Data Analytics
LetsAI Solutions
 
PPTX
From spreadsheets and delays to real-time control
SatishKumar2651
 
PPTX
Smart Doctor Appointment Booking option in odoo.pptx
AxisTechnolabs
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PDF
UITP Summit Meep Pitch may 2025 MaaS Rebooted
campoamor1
 
PDF
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PDF
intro_to_cpp_namespace_robotics_corner.pdf
MohamedSaied877003
 
PPTX
prodad heroglyph crack 2.0.214.2 Full Free Download
cracked shares
 
PDF
Everything you need to know about pricing & licensing Microsoft 365 Copilot f...
Q-Advise
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PPTX
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
PDF
MiniTool Power Data Recovery 8.8 With Crack New Latest 2025
bashirkhan333g
 
PDF
Dipole Tech Innovations – Global IT Solutions for Business Growth
dipoletechi3
 
PDF
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
Why is partnering with a SaaS development company crucial for enterprise succ...
Nextbrain Technologies
 
Ready Layer One: Intro to the Model Context Protocol
mmckenna1
 
Salesforce Experience Cloud Consultant.pdf
VALiNTRY360
 
Transforming Insights: How Generative AI is Revolutionizing Data Analytics
LetsAI Solutions
 
From spreadsheets and delays to real-time control
SatishKumar2651
 
Smart Doctor Appointment Booking option in odoo.pptx
AxisTechnolabs
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
UITP Summit Meep Pitch may 2025 MaaS Rebooted
campoamor1
 
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
intro_to_cpp_namespace_robotics_corner.pdf
MohamedSaied877003
 
prodad heroglyph crack 2.0.214.2 Full Free Download
cracked shares
 
Everything you need to know about pricing & licensing Microsoft 365 Copilot f...
Q-Advise
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
MiniTool Power Data Recovery 8.8 With Crack New Latest 2025
bashirkhan333g
 
Dipole Tech Innovations – Global IT Solutions for Business Growth
dipoletechi3
 
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 

KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November 2018

  • 1. © 2018 KNIME AG. All Right Reserved. From Raw Data to Deployment KNIMEr: [email protected] KNIMEr: [email protected] KNIMEr: [email protected] @KNIME
  • 2. © 2018 KNIME AG. All Rights Reserved. Do you recognize this? 2 https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
  • 3. © 2018 KNIME AG. All Rights Reserved. Let’s unroll it! It always starts with some data … 3 Data Preparation Model Training Model Optimization Deployment Data Manipulation Data Blending Missing Values Handling Feature Generation Dimensionality Reduction Feature Selection Outlier Removal Normalization Partitioning … Model Training Bag of Models Model Selection Ensemble Models Own Ensemble Model External Models Import Existing Models Model Factory … Parameter Tuning Parameter Optimization Regularization Model Size No. Iterations … Performance Measures Accuracy ROC Curve Cross-Validation … Files & DBs Dashboards REST API SQL Code Export Reporting … Model Evaluation
  • 4. © 2018 KNIME AG. All Rights Reserved. The many Lives of a Dataset 4 Data Preparation Model Training Model Optimization Model Evaluation Deployment Partitioning: • Training Set • Validation Set • Test Set Training Set Validation Set Test Set New Data from Real World Applications Original Data Set with Past Observations
  • 5. © 2018 KNIME AG. All Rights Reserved. Data Exploration • Sometimes in between Data Access and Data Preparation there is a Data Exploration phase • The Data Exploration phase is useful to get to know the data • KNIME offers a few visualization nodes to build dashboards to explore the data 5
  • 6. © 2018 KNIME AG. All Rights Reserved. What about Big Data? • Big Data serves Scalability • The whole Analytics Process is no different on Big Data • You need: – a Big Data Platform – The KNIME Big Data (Spark & Hive) Extension 6
  • 7. © 2018 KNIME AG. All Rights Reserved. One Example for Every Need The KNIME EXAMPLES Server 7 50_Applications
  • 8. © 2018 KNIME AG. All Rights Reserved. Classification Problem & Data Set • Airline Dataset: http://stat-computing.org/dataexpo/2009/the-data.html • Smaller dataset (Jan 2007) (AirlineDataset.table) • Challenge: Predict Departure Delays If on original airline dataset, only flights from airport ORD Output Class = “delay” if depdelay > 15min otherwise “no delay” Available features: date, dep time, arr time, carrier, destination, cancelled, … 8
  • 9. © 2018 KNIME AG. All Rights Reserved. Challenges • Group 1. Data Access and Data Preparation • Group 2. ML Model Training • Group 3. Model Deployment • Import file Learnathon_2018.knar into your workspace 9
  • 10. © 2018 KNIME AG. All Rights Reserved. Group 1. Data Access and Data Preparation 10
  • 11. © 2018 KNIME AG. All Rights Reserved. Group 2. Model Training & Optimization 11
  • 12. © 2018 KNIME AG. All Rights Reserved. Group 3. Deployment 12 • Deployment Options – Multiple challenges: – Workflow deployment to KNIME Server – Remote/Scheduled execution from KNIME Server – KNIME RESTful Web Services – Build a Composite Interactive Dashboard and make it available on KNIME Web Portal – Generate a report with BIRT – Write Prediction Results to a Database
  • 13. © 2018 KNIME AG. All Rights Reserved. KNIME Spring Summit 2019 March 18 – 22 at bcc Berlin Congress Center, Berlin • Monday & Tuesday: One-day courses • Wednesday & Thursday: Summit sessions • Friday: Workshops Use the code LEARNATHON for 10% off tickets! Register at knime.com/spring-summit2019
  • 14. © 2018 KNIME AG. All Rights Reserved. KNIME Beginner’s Luck Free Copy of KNIME Beginner’s Luck Book from KNIME Press https://www.knime.com/knimepress with this code: PARIS-1118 15
  • 15. © 2018 KNIME AG. All Rights Reserved. You can find KNIMers here! 16 • KNIME (www.knime.com) • BLOG for news, tips and tricks(www.knime.com/blog) • FORUM for questions and answers (tech.knime.com/forum) • EXAMPLE SERVER for example workflows • LEARNING HUB (www.knime.com/learning-hub) • KNIME TV channel on • KNIME on @KNIME • KNIME on https://www.facebook.com/KNIMEanalytics • On
  • 16. © 2018 KNIME AG. All Rights Reserved. 17 The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany. Thank You!