SlideShare a Scribd company logo
© 2019 KNIME AG. All rights reserved.
From Raw Data to Deployment
KNIMEr: Kathrin.Melcher@knime.com
KNIMEr: Maarit.Widmann@knime.com
@KNIME
© 2019 KNIME AG. All rights reserved.
Do you recognize this?
2
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
© 2019 KNIME AG. All rights reserved.
Let’s unroll it!
It always starts
with some data …
3
Data
Preparation
Model
Training
Model
Optimization
Deployment
Data Manipulation
Data Blending
Missing Values Handling
Feature Generation
Dimensionality Reduction
Feature Selection
Outlier Removal
Normalization
Partitioning
…
Model Training
Bag of Models
Model Selection
Ensemble Models
Own Ensemble Model
External Models
Import Existing Models
Model Factory
…
Parameter Tuning
Parameter Optimization
Regularization
Model Size
No. Iterations
…
Performance Measures
Accuracy
ROC Curve
Cross-Validation
…
Files & DBs
Dashboards
REST API
SQL Code Export
Reporting
…
Model
Evaluation
© 2019 KNIME AG. All rights reserved.
The many Lives of a Dataset
4
Data
Preparation
Model
Training
Model
Optimization
Model
Evaluation
Deployment
Partitioning:
• Training Set
• Validation Set
• Test Set
Training Set Validation Set Test Set New Data from Real
World Applications
Original Data
Set with Past
Observations
© 2019 KNIME AG. All rights reserved.
Data Exploration
• Sometimes in between Data Access and Data
Preparation there is a Data Exploration phase
• The Data Exploration phase is useful to get to
know the data
• KNIME offers a few visualization nodes to build
dashboards to explore the data
5
© 2019 KNIME AG. All rights reserved.
What about Big Data?
• Big Data serves Scalability
• The whole Analytics Process is no different on
Big Data
• You need:
– a Big Data Platform
– The KNIME Big Data (Spark & Hive) Extension
6
© 2019 KNIME AG. All rights reserved.
One Example for Every Need – on KNIME EXAMPLES Server
The KNIME EXAMPLES Server
7
50_Applications
© 2019 KNIME AG. All rights reserved.
Classification Problem & Data Set
• Airline Dataset: http://stat-computing.org/dataexpo/2009/the-data.html
• Smaller dataset (Jan 2007) (AirlineDataset.table)
• Challenge:
Predict Departure Delays
If on original airline dataset, only flights from airport ORD
Output Class = “delay” if depdelay > 15min
otherwise “no delay”
Available features: date, dep time, arr time, carrier, destination, cancelled, …
14
© 2019 KNIME AG. All rights reserved.
Challenges
• Group 1. Data Access and Data Preparation
• Group 2. ML Model Training
• Group 3. Model Deployment
• Import file Learnathon_2019.knar into your workspace
15
© 2019 KNIME AG. All rights reserved.
Group 1. Data Access and Data Preparation
16
© 2019 KNIME AG. All rights reserved.
Group 2. Model Training & Optimization
17
© 2019 KNIME AG. All rights reserved.
Group 3. Deployment
18
• Deployment Options – Multiple challenges:
– Workflow deployment to KNIME Server
– Remote/Scheduled execution from KNIME
Server
– KNIME RESTful Web Services
– Build a Composite Interactive Dashboard and
make it available on KNIME Web Portal
– Generate a report with BIRT
– Write Prediction Results to a Database
© 2019 KNIME AG. All rights reserved.
KNIME Fall Summit 2019
November 5 – 8 at AT&T Executive Education and Conference Center,
Austin, Texas
• Tuesday & Wednesday: One-day courses
• Thursday & Friday: Summit sessions
Register by October 1 for
10 % Early Bird Discount
with this code:
LEARNATHON-DUBLIN
Register at
knime.com/summits
© 2019 KNIME AG. All rights reserved.
KNIME Beginner’s Luck
Free Copy of KNIME Beginner’s Luck Book from KNIME Press
https://www.knime.com/knimepress
with this code: DUBLIN-0619
20
© 2019 KNIME AG. All rights reserved.
Stay connected with KNIME
Blog: knime.com/blog
Forum: forum.knime.com
KNIME Hub: hub.knime.com
Follow us on social media:
KNIME E-Learning Course:
www.knime.com/e-learning-course
© 2019 KNIME AG. All rights reserved.
The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH,
and are registered in the United States. KNIME® is also registered in Germany.
Thank You!
#KNIME
#Learnathon

More Related Content

What's hot (20)

PDF
Advanced analytics for the Internet of Things. Restocking Rental Bike Stations
KNIMESlides
 
PDF
Just add Imagination
KNIMESlides
 
PDF
Sentiment Analysis with KNIME Analytics Platform
KNIMESlides
 
PDF
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
KNIMESlides
 
PDF
Codeless Deep Learning for Language Modeling and Image Classification
KNIMESlides
 
PDF
Automating Inferences out of Financial Data
KNIMESlides
 
PDF
AWS reInvent 2019 Trip Report
Craig Milroy
 
PPTX
Instil - Why focus on cloud computing?
IainCameron35
 
PDF
What's new in_fme_2020_gerhard_fischl
GIM_nv
 
PPTX
Cloud Governance within The Climate Corporation
Mohamed Ahmed
 
PPTX
Software-Cluster Internationalisation focusing Bahia/Brazil: R+D project of t...
ElisabethStemmler
 
PPT
Rightscale Cloudcamp Boston
jtreadway
 
PDF
Get Your Aircraft Spare Parts Inventory Management Off the Ground
PTC
 
PPTX
Precisition Agriculture - (Stephan Vormbrock, CLAAS)
The European GNSS Agency (GSA)
 
PDF
This week in Neo4j - 3rd February 2018
Mark Needham
 
PPSX
Amberix Energy Efficient Facilities
gueste5667f2
 
PDF
Big data, Cloud, and the NOAA CRADA at The Climate Corporation
Valliappa Lakshmanan
 
PDF
"Showcase Skiing Area" IoT Presentation - Splunk Partner Executive Summit: 18...
Splunk
 
PDF
3D Clash Detection
Safe Software
 
PDF
Geold2015 wauer
geoknow
 
Advanced analytics for the Internet of Things. Restocking Rental Bike Stations
KNIMESlides
 
Just add Imagination
KNIMESlides
 
Sentiment Analysis with KNIME Analytics Platform
KNIMESlides
 
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
KNIMESlides
 
Codeless Deep Learning for Language Modeling and Image Classification
KNIMESlides
 
Automating Inferences out of Financial Data
KNIMESlides
 
AWS reInvent 2019 Trip Report
Craig Milroy
 
Instil - Why focus on cloud computing?
IainCameron35
 
What's new in_fme_2020_gerhard_fischl
GIM_nv
 
Cloud Governance within The Climate Corporation
Mohamed Ahmed
 
Software-Cluster Internationalisation focusing Bahia/Brazil: R+D project of t...
ElisabethStemmler
 
Rightscale Cloudcamp Boston
jtreadway
 
Get Your Aircraft Spare Parts Inventory Management Off the Ground
PTC
 
Precisition Agriculture - (Stephan Vormbrock, CLAAS)
The European GNSS Agency (GSA)
 
This week in Neo4j - 3rd February 2018
Mark Needham
 
Amberix Energy Efficient Facilities
gueste5667f2
 
Big data, Cloud, and the NOAA CRADA at The Climate Corporation
Valliappa Lakshmanan
 
"Showcase Skiing Area" IoT Presentation - Splunk Partner Executive Summit: 18...
Splunk
 
3D Clash Detection
Safe Software
 
Geold2015 wauer
geoknow
 

Similar to KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 2019 (20)

PDF
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIMESlides
 
PPTX
From Raw Data to Deployment
KNIMESlides
 
PDF
Sharing and Deploying Data Science with KNIME Server
KNIMESlides
 
PDF
IBM Cloud Private and IBM Power Systems: Overview and Real-World Scenarios
Joe Cropper
 
PDF
So you want to provision a test environment...
DevOps.com
 
PDF
Developing Game-Changing Embedded Intelligence (Francesca Perino, MathWorks)
Data Driven Innovation
 
PDF
Emerging Cloud Migration Approaches
Arvind Viswanathan
 
PPTX
Capitalizing on cloud 4.3.18
Yves Bienenfeld
 
PDF
How to build containerized architectures for deep learning - Data Festival 20...
Antje Barth
 
PDF
Maximizing the Value of IBM’s New Mainframe Pricing Model
Precisely
 
PPTX
DO for WS - PA external v1
Alain Chabrier
 
PDF
A journey to faster, repeatable data commercialization
Institute of Contemporary Sciences
 
PDF
Leveraging the Power of the ServiceNow® Platform with Mainframe and IBM i Sys...
Precisely
 
PPTX
IBM elm alm overview-software engineerin-lifecycle-management
Imran Hashmi
 
PPTX
Machine Learning with Apache Spark
IBM Cloud Data Services
 
PDF
Take the Bias out of Big Data Insights With Augmented Analytics
Tyler Wishnoff
 
PDF
Continuous Deployment for Deep Learning
Databricks
 
PPTX
Msp deck charles- final mb 2020 - Multicloud overview
Charles Keatts
 
PDF
IBM Cloud Côte d'Azur Meetup - 20190328 - Optimisation
IBM France Lab
 
PDF
Z105745 ibmz-cloud-cairo-v1902a
Tony Pearson
 
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIMESlides
 
From Raw Data to Deployment
KNIMESlides
 
Sharing and Deploying Data Science with KNIME Server
KNIMESlides
 
IBM Cloud Private and IBM Power Systems: Overview and Real-World Scenarios
Joe Cropper
 
So you want to provision a test environment...
DevOps.com
 
Developing Game-Changing Embedded Intelligence (Francesca Perino, MathWorks)
Data Driven Innovation
 
Emerging Cloud Migration Approaches
Arvind Viswanathan
 
Capitalizing on cloud 4.3.18
Yves Bienenfeld
 
How to build containerized architectures for deep learning - Data Festival 20...
Antje Barth
 
Maximizing the Value of IBM’s New Mainframe Pricing Model
Precisely
 
DO for WS - PA external v1
Alain Chabrier
 
A journey to faster, repeatable data commercialization
Institute of Contemporary Sciences
 
Leveraging the Power of the ServiceNow® Platform with Mainframe and IBM i Sys...
Precisely
 
IBM elm alm overview-software engineerin-lifecycle-management
Imran Hashmi
 
Machine Learning with Apache Spark
IBM Cloud Data Services
 
Take the Bias out of Big Data Insights With Augmented Analytics
Tyler Wishnoff
 
Continuous Deployment for Deep Learning
Databricks
 
Msp deck charles- final mb 2020 - Multicloud overview
Charles Keatts
 
IBM Cloud Côte d'Azur Meetup - 20190328 - Optimisation
IBM France Lab
 
Z105745 ibmz-cloud-cairo-v1902a
Tony Pearson
 
Ad

More from KNIMESlides (11)

PDF
What's New in KNIME Analytics Platform 4.1
KNIMESlides
 
PDF
Credit Card Fraud Detection Tutorial
KNIMESlides
 
PDF
Practicing Data Science: A Collection of Case Studies
KNIMESlides
 
PDF
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
KNIMESlides
 
PDF
Chemistry Data Basics with KNIME Analytics Platform
KNIMESlides
 
PDF
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
KNIMESlides
 
PDF
KNIME Software Overview
KNIMESlides
 
PDF
Heterogeneous Data Mining with Spark
KNIMESlides
 
PDF
Knime customer intelligence on social media: Text Analytics vs. Network Mining
KNIMESlides
 
PDF
Text Processing with KNIME
KNIMESlides
 
PDF
Big Data with KNIME is as easy as 1, 2, 3, ...4!
KNIMESlides
 
What's New in KNIME Analytics Platform 4.1
KNIMESlides
 
Credit Card Fraud Detection Tutorial
KNIMESlides
 
Practicing Data Science: A Collection of Case Studies
KNIMESlides
 
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
KNIMESlides
 
Chemistry Data Basics with KNIME Analytics Platform
KNIMESlides
 
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
KNIMESlides
 
KNIME Software Overview
KNIMESlides
 
Heterogeneous Data Mining with Spark
KNIMESlides
 
Knime customer intelligence on social media: Text Analytics vs. Network Mining
KNIMESlides
 
Text Processing with KNIME
KNIMESlides
 
Big Data with KNIME is as easy as 1, 2, 3, ...4!
KNIMESlides
 
Ad

Recently uploaded (20)

PDF
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
PDF
Latest Capcut Pro 5.9.0 Crack Version For PC {Fully 2025
utfefguu
 
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
Alluxio, Inc.
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PPTX
Prompt Like a Pro. Leveraging Salesforce Data to Power AI Workflows.pptx
Dele Amefo
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PDF
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
PPTX
UI5con_2025_Accessibility_Ever_Evolving_
gerganakremenska1
 
PDF
Salesforce Experience Cloud Consultant.pdf
VALiNTRY360
 
PPTX
iaas vs paas vs saas :choosing your cloud strategy
CloudlayaTechnology
 
PDF
Why is partnering with a SaaS development company crucial for enterprise succ...
Nextbrain Technologies
 
PPTX
Transforming Insights: How Generative AI is Revolutionizing Data Analytics
LetsAI Solutions
 
PPTX
Build a Custom Agent for Agentic Testing.pptx
klpathrudu
 
PDF
Is Framer the Future of AI Powered No-Code Development?
Isla Pandora
 
PDF
intro_to_cpp_namespace_robotics_corner.pdf
MohamedSaied877003
 
PPTX
Function & Procedure: Function Vs Procedure in PL/SQL
Shani Tiwari
 
PDF
Ready Layer One: Intro to the Model Context Protocol
mmckenna1
 
PPTX
Library_Management_System_PPT111111.pptx
nmtnissancrm
 
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
Latest Capcut Pro 5.9.0 Crack Version For PC {Fully 2025
utfefguu
 
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
Alluxio, Inc.
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Prompt Like a Pro. Leveraging Salesforce Data to Power AI Workflows.pptx
Dele Amefo
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
UI5con_2025_Accessibility_Ever_Evolving_
gerganakremenska1
 
Salesforce Experience Cloud Consultant.pdf
VALiNTRY360
 
iaas vs paas vs saas :choosing your cloud strategy
CloudlayaTechnology
 
Why is partnering with a SaaS development company crucial for enterprise succ...
Nextbrain Technologies
 
Transforming Insights: How Generative AI is Revolutionizing Data Analytics
LetsAI Solutions
 
Build a Custom Agent for Agentic Testing.pptx
klpathrudu
 
Is Framer the Future of AI Powered No-Code Development?
Isla Pandora
 
intro_to_cpp_namespace_robotics_corner.pdf
MohamedSaied877003
 
Function & Procedure: Function Vs Procedure in PL/SQL
Shani Tiwari
 
Ready Layer One: Intro to the Model Context Protocol
mmckenna1
 
Library_Management_System_PPT111111.pptx
nmtnissancrm
 

KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 2019

  • 1. © 2019 KNIME AG. All rights reserved. From Raw Data to Deployment KNIMEr: [email protected] KNIMEr: [email protected] @KNIME
  • 2. © 2019 KNIME AG. All rights reserved. Do you recognize this? 2 https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
  • 3. © 2019 KNIME AG. All rights reserved. Let’s unroll it! It always starts with some data … 3 Data Preparation Model Training Model Optimization Deployment Data Manipulation Data Blending Missing Values Handling Feature Generation Dimensionality Reduction Feature Selection Outlier Removal Normalization Partitioning … Model Training Bag of Models Model Selection Ensemble Models Own Ensemble Model External Models Import Existing Models Model Factory … Parameter Tuning Parameter Optimization Regularization Model Size No. Iterations … Performance Measures Accuracy ROC Curve Cross-Validation … Files & DBs Dashboards REST API SQL Code Export Reporting … Model Evaluation
  • 4. © 2019 KNIME AG. All rights reserved. The many Lives of a Dataset 4 Data Preparation Model Training Model Optimization Model Evaluation Deployment Partitioning: • Training Set • Validation Set • Test Set Training Set Validation Set Test Set New Data from Real World Applications Original Data Set with Past Observations
  • 5. © 2019 KNIME AG. All rights reserved. Data Exploration • Sometimes in between Data Access and Data Preparation there is a Data Exploration phase • The Data Exploration phase is useful to get to know the data • KNIME offers a few visualization nodes to build dashboards to explore the data 5
  • 6. © 2019 KNIME AG. All rights reserved. What about Big Data? • Big Data serves Scalability • The whole Analytics Process is no different on Big Data • You need: – a Big Data Platform – The KNIME Big Data (Spark & Hive) Extension 6
  • 7. © 2019 KNIME AG. All rights reserved. One Example for Every Need – on KNIME EXAMPLES Server The KNIME EXAMPLES Server 7 50_Applications
  • 8. © 2019 KNIME AG. All rights reserved. Classification Problem & Data Set • Airline Dataset: http://stat-computing.org/dataexpo/2009/the-data.html • Smaller dataset (Jan 2007) (AirlineDataset.table) • Challenge: Predict Departure Delays If on original airline dataset, only flights from airport ORD Output Class = “delay” if depdelay > 15min otherwise “no delay” Available features: date, dep time, arr time, carrier, destination, cancelled, … 14
  • 9. © 2019 KNIME AG. All rights reserved. Challenges • Group 1. Data Access and Data Preparation • Group 2. ML Model Training • Group 3. Model Deployment • Import file Learnathon_2019.knar into your workspace 15
  • 10. © 2019 KNIME AG. All rights reserved. Group 1. Data Access and Data Preparation 16
  • 11. © 2019 KNIME AG. All rights reserved. Group 2. Model Training & Optimization 17
  • 12. © 2019 KNIME AG. All rights reserved. Group 3. Deployment 18 • Deployment Options – Multiple challenges: – Workflow deployment to KNIME Server – Remote/Scheduled execution from KNIME Server – KNIME RESTful Web Services – Build a Composite Interactive Dashboard and make it available on KNIME Web Portal – Generate a report with BIRT – Write Prediction Results to a Database
  • 13. © 2019 KNIME AG. All rights reserved. KNIME Fall Summit 2019 November 5 – 8 at AT&T Executive Education and Conference Center, Austin, Texas • Tuesday & Wednesday: One-day courses • Thursday & Friday: Summit sessions Register by October 1 for 10 % Early Bird Discount with this code: LEARNATHON-DUBLIN Register at knime.com/summits
  • 14. © 2019 KNIME AG. All rights reserved. KNIME Beginner’s Luck Free Copy of KNIME Beginner’s Luck Book from KNIME Press https://www.knime.com/knimepress with this code: DUBLIN-0619 20
  • 15. © 2019 KNIME AG. All rights reserved. Stay connected with KNIME Blog: knime.com/blog Forum: forum.knime.com KNIME Hub: hub.knime.com Follow us on social media: KNIME E-Learning Course: www.knime.com/e-learning-course
  • 16. © 2019 KNIME AG. All rights reserved. The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany. Thank You! #KNIME #Learnathon