SlideShare a Scribd company logo
© 2018 KNIME AG. All Rights Reserved.
Interactive and reproducible data
analysis with the open-source KNIME
Analytics Platform
Greg Landrum, Ph.D.
KNIME AG
@dr_greg_landrum
Frankfurt Data Science Meetup
18 June 2018
© 2018 KNIME AG. All Rights Reserved.
Let’s talk about reproducible data
analysis
Greg Landrum, Ph.D.
KNIME AG
@dr_greg_landrum
Frankfurt Data Science Meetup
18 June 2018
© 2018 KNIME AG. All Rights Reserved. 3
Some data analysis problems
• Interactive data analysis and modeling
• Repeatability and reproducibility
© 2018 KNIME AG. All Rights Reserved. 4
Some data analysis problems
• Interactive data analysis and modeling
• Repeatability and reproducibility
• The need to use multiple tools and multiple data
sources
• Collaboration between users with different
sophistication levels
• Deployment
• Just staying organized
© 2018 KNIME AG. All Rights Reserved. 5
Some data analysis problems
• Interactive data analysis and modeling
• Repeatability and reproducibility
• The need to use multiple tools and multiple data
sources
• Collaboration between users with different
sophistication levels
• Deployment
• Just staying organized
I think workflows can help
with these
© 2018 KNIME AG. All Rights Reserved. 6
Some data analysis problems
• Interactive data analysis and modeling
• Repeatability and reproducibility
• The need to use multiple tools and multiple data
sources
• Collaboration between users with different
sophistication levels
• Deployment
• Just staying organized
I think KNIME can help with
these
© 2018 KNIME AG. All Rights Reserved. 7
The Open Source KNIME® Analytics Platform
© 2018 KNIME AG. All Rights Reserved. 8
Visual KNIME Workflows
Nodes perform tasks on data
Workflows combine nodes
to model data flow
Status
Input(s)
Outputs
Not Configured
Idle
Executed
Error
© 2018 KNIME AG. All Rights Reserved. 9
Workflows capture parameters and data
https://www.knime.com/blog/stuck-in-the-nine-circles-of-hell-try-parameter-optimization-a-cup-of-tea
© 2018 KNIME AG. All Rights Reserved. 10
Workflows capture parameters and data
https://www.knime.com/blog/stuck-in-the-nine-circles-of-hell-try-parameter-optimization-a-cup-of-tea
© 2018 KNIME AG. All Rights Reserved. 11
Workflows capture parameters and data
https://www.knime.com/blog/stuck-in-the-nine-circles-of-hell-try-parameter-optimization-a-cup-of-tea
© 2018 KNIME AG. All Rights Reserved. 12
Workflows capture parameters and data
https://www.knime.com/blog/stuck-in-the-nine-circles-of-hell-try-parameter-optimization-a-cup-of-tea
© 2018 KNIME AG. All Rights Reserved. 13
Analysis & Mining
Statistics, Machine Learning, Data
Mining, Web Analytics, Text
Mining, Network Analysis, Social
Media Analysis, R, Weka, Python,
Community / 3rd party, ...
Data Access
MySQL, Oracle, ...
SAS, SPSS, ...
Excel, Flat, ...
Hive, Impala, ...
XML, JSON, PMML
Text, Doc, Image, ...
Web Crawlers,
Industry Specific,
Community / 3rd
party ...
Transformation
Row, Column, Matrix
Text, Image, Networks, Time
Series, Java, Python,
Community / 3rd party, ...
Visualization
R, Python,
JFreeChart,
JavaScript,
Community / 3rd party, ...
Deployment
via BIRT
PMML, XML, JSON
Databases, Excel, Flat, etc.
Text, Doc, Image
Industry Specific
Community / 3rd party, ...
Over 2000 native and embedded open-source nodes :
Big Data
Hive, Impala, HDFS Vertica,
Teradata/Aster, Spark, MLlib,
Community / 3rd party, ...
© 2018 KNIME AG. All Rights Reserved. 14
Some supported data types
• Numbers, strings, etc
• Bit/count vectors
• Images
• Documents
• Networks
• Chemistry types
• Biology types
© 2018 KNIME AG. All Rights Reserved. 15
Free E-Learning Course: Web Page
15
• Hands-on e-learning course
• Data Access, ETL, Analytics, Control
Structures, Visualization
• Around 50 small units
• … with exercises
• … and with solutions on the
EXAMPLES server
• Final exercises to test your
knowledge!
https://www.knime.org/knime-
introductory-course
© 2018 KNIME AG. All Rights Reserved. 16
KNIME Beginner’s Luck Book
Free Copy of KNIME Beginner’s Luck Book at KNIME Press
https://www.knime.org/knimepress
Promotion Code:
FRANKFURT-MEETUP-0618
Valid until 31 August
© 2018 KNIME AG. All Rights Reserved. 17
The KNIME Software Ecosystem
KNIME
Analytics
Platform
KNIME
Supported
Extensions
KNIME
Extensions
Partner
Extensions
Community
Extensions
KNIME Server
© 2018 KNIME AG. All Rights Reserved. 18
KNIME Server
Shared Repositories Access Management Web Enablement
Flexible Execution
© 2018 KNIME AG. All Rights Reserved. 19
KNIME, the company
• KNIME AG founded in 2008
• Offices in Zürich (HQ), Konstanz, Berlin, and Austin
• 50+ employees
• Maintainer of the Open Source KNIME Analytics Platform
– comprehensive data loading, processing, analysis, modeling platform
– visual frontend
– open: to all sorts of data, other tools (R and Python, etc.), various user
personas
– 20+ open source releases since 2006
– open source.
• KNIME Server
– 14 commercial product releases since 2008
• KNIME cloud offerings
© 2018 KNIME AG. All Rights Reserved. 20
Some data analysis problems
• Interactive data analysis and modeling
• Repeatability and reproducibility
• The need to use multiple tools and multiple data
sources
• Collaboration between users with different
sophistication levels
• Deployment
• Just staying organized
© 2018 KNIME AG. All Rights Reserved. 21
Interactive data analysis and modeling
• Fairly often the whole process of data
preprocessing, analysis, and modeling can’t be (or
shouldn’t be) fully automated.
• We want/need a human in the loop
• Would be lovely if this weren’t painful
Interactive
© 2018 KNIME AG. All Rights Reserved. 22
Repeatability and reproducibility
• I can reproduce what I did before or repeat the
same process with a different data set/method
• You can do the same thing
• Not necessarily talking about strict reproducibility
(out to the 15th decimal place), but if we miss that
we should be able discover where deviations come
from
• Would be lovely if this weren’t painful
Reproducible
© 2018 KNIME AG. All Rights Reserved. 23
The need to use multiple tools and multiple data sources
• There is no one-size-fits-all solution (or “one-stop
shop”)
• We’re inevitably going to be using more than one
piece of software and working with data from more
than one source.
• Would be lovely if this weren’t painful
Open
© 2018 KNIME AG. All Rights Reserved. 24
Collaboration between users with different sophistication levels
• Some personae:1
– The scripter/programmer: “I’ve got this great new
method you should try”
– The tool user: “I’ll use software, but there’s no way I’m
writing code”
– The “stakeholder”: “Those folks are doing useful stuff and
I need their results, but I don’t have time to learn some
complex new piece of software.”
• Would be lovely if enabling collaboration between
these different personae wasn’t painful
1 Yes, these are stereotypes
Collaborative
© 2018 KNIME AG. All Rights Reserved. 25
Deployment
• Once I’ve built something I’d like to make it available
to my colleagues
– Sharing models
– Sharing methods
– Sharing results
• Would be lovely if this weren’t painful
Deployable
© 2018 KNIME AG. All Rights Reserved. 26
Just staying organized
• I can usually remember where my scripts are
• There’s no way I can remember where yours are
• It would be lovely if it weren’t painful to find stuff
Findable
© 2018 KNIME AG. All Rights Reserved. 27
Some data analysis problems
• Interactive data analysis and modeling
• Repeatability and reproducibility
• The need to use multiple tools and multiple data
sources
• Collaboration between users with different
sophistication levels
• Deployment
• Just staying organized
© 2018 KNIME AG. All Rights Reserved. 28
Some data analysis problems
• Interactive data analysis and modeling
• Repeatability and reproducibility
• The need to use multiple tools and multiple data
sources
• Collaboration between users with different
sophistication levels
• Deployment
• Just staying organized
I think KNIME can help with
these
29© 2018 KNIME AG. All Rights Reserved.
Thanks!
dr_greg_landrum
greg.landrum@knime.com
30© 2018 KNIME AG. All Rights Reserved.
The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by
KNIME.com AG under license from KNIME GmbH, and are registered in the United States.
KNIME® is also registered in Germany.

More Related Content

What's hot (20)

PDF
Know your R usage workflow to handle reproducibility challenges
Wit Jakuczun
 
PDF
This Helix Nebula Science Cloud Pilot Phase Open Session
Helix Nebula The Science Cloud
 
PDF
ENES & EUDAT Uptake Report
EUDAT
 
PPTX
Production Grade Data Science for Hadoop
DataWorks Summit/Hadoop Summit
 
PPTX
Raster Algebra mit Oracle Spatial und uDig
Karin Patenge
 
PPTX
OpenACC Monthly Highlights: March 2021
OpenACC
 
PPTX
HPC in higher education
Kishor Satpathy
 
PPTX
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Revolution Analytics
 
PDF
Graph Gurus 15: Introducing TigerGraph 2.4
TigerGraph
 
PPTX
Worldwide LHC Computing Grid - Ian Bird -HNSciCloud Prototype Phase kickoff M...
Helix Nebula The Science Cloud
 
PDF
NSF CAC Cloud Interoperability Testbed Projects
Alan Sill
 
PPTX
OpenACC Highlights: 2019 Year in Review
OpenACC
 
PPTX
From Raw Data to Deployment
KNIMESlides
 
PPTX
OpenACC Monthly Highlights: June 2020
OpenACC
 
PPTX
Hadoop Summit - Sanoma self service on hadoop
Sander Kieft
 
PDF
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
TigerGraph
 
PDF
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
Databricks
 
PPTX
TULIPP - Leaving a legacy: The ultimate Low-Power Image Processing Handbook
Sundance Multiprocessor Technology Ltd.
 
PPTX
OpenACC Monthly Highlights: June 2021
OpenACC
 
PPTX
OpenACC Monthly Highlights: May 2019
OpenACC
 
Know your R usage workflow to handle reproducibility challenges
Wit Jakuczun
 
This Helix Nebula Science Cloud Pilot Phase Open Session
Helix Nebula The Science Cloud
 
ENES & EUDAT Uptake Report
EUDAT
 
Production Grade Data Science for Hadoop
DataWorks Summit/Hadoop Summit
 
Raster Algebra mit Oracle Spatial und uDig
Karin Patenge
 
OpenACC Monthly Highlights: March 2021
OpenACC
 
HPC in higher education
Kishor Satpathy
 
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Revolution Analytics
 
Graph Gurus 15: Introducing TigerGraph 2.4
TigerGraph
 
Worldwide LHC Computing Grid - Ian Bird -HNSciCloud Prototype Phase kickoff M...
Helix Nebula The Science Cloud
 
NSF CAC Cloud Interoperability Testbed Projects
Alan Sill
 
OpenACC Highlights: 2019 Year in Review
OpenACC
 
From Raw Data to Deployment
KNIMESlides
 
OpenACC Monthly Highlights: June 2020
OpenACC
 
Hadoop Summit - Sanoma self service on hadoop
Sander Kieft
 
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
TigerGraph
 
ML, Statistics, and Spark with Databricks for Maximizing Revenue in a Delayed...
Databricks
 
TULIPP - Leaving a legacy: The ultimate Low-Power Image Processing Handbook
Sundance Multiprocessor Technology Ltd.
 
OpenACC Monthly Highlights: June 2021
OpenACC
 
OpenACC Monthly Highlights: May 2019
OpenACC
 

Similar to Let’s talk about reproducible data analysis (20)

PDF
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Greg Landrum
 
PPT
Knime
Amal Targhi
 
PDF
KNIME Software Overview
KNIMESlides
 
PPTX
KNIME Data Connect - 5th December 2024 (Arief).pptx
DwiCahya58
 
PDF
Open Source Story and what’s new in KNIME Software
KNIMESlides
 
PDF
Big Data with KNIME.pdf
James Vp
 
PDF
Your Flight is Boarding Now!
MeetupDataScienceRoma
 
PPTX
Building an AI and ML Model Using KNIME and Python.pptx
ssuser448ad3
 
PPTX
KNIME_Overview_Presentation data mining tools
YazanMohamed1
 
PPTX
Introduction to knime
Bernardo Najlis
 
PDF
KNIME For Data Analytics Course Overview
BakhtiarAmaludin
 
PDF
From_SPSS Modeler_to_KNIME_v4.7_ebook.pdf
VeniAgustina1
 
PDF
KNIME_Server_ProductSheet_122020.pdf
LeangsengLim1
 
DOCX
Tools for Unstructured Data Analytics
Ravi Teja
 
PDF
Building a guided analytics forecasting platform with Knime
Knoldus Inc.
 
PDF
Heterogeneous Data Mining with Spark
KNIMESlides
 
PDF
Knime & bioinformatics
BioinformaticsInstitute
 
PPTX
Knime (Konstanz Information Miner)
Kiran Buriro
 
PDF
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIMESlides
 
PDF
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIMESlides
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Greg Landrum
 
KNIME Software Overview
KNIMESlides
 
KNIME Data Connect - 5th December 2024 (Arief).pptx
DwiCahya58
 
Open Source Story and what’s new in KNIME Software
KNIMESlides
 
Big Data with KNIME.pdf
James Vp
 
Your Flight is Boarding Now!
MeetupDataScienceRoma
 
Building an AI and ML Model Using KNIME and Python.pptx
ssuser448ad3
 
KNIME_Overview_Presentation data mining tools
YazanMohamed1
 
Introduction to knime
Bernardo Najlis
 
KNIME For Data Analytics Course Overview
BakhtiarAmaludin
 
From_SPSS Modeler_to_KNIME_v4.7_ebook.pdf
VeniAgustina1
 
KNIME_Server_ProductSheet_122020.pdf
LeangsengLim1
 
Tools for Unstructured Data Analytics
Ravi Teja
 
Building a guided analytics forecasting platform with Knime
Knoldus Inc.
 
Heterogeneous Data Mining with Spark
KNIMESlides
 
Knime & bioinformatics
BioinformaticsInstitute
 
Knime (Konstanz Information Miner)
Kiran Buriro
 
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIMESlides
 
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIMESlides
 
Ad

More from Greg Landrum (12)

PDF
Chemical registration
Greg Landrum
 
PDF
Mike Lynch Award Lecture, ICCS 2022
Greg Landrum
 
PDF
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Greg Landrum
 
PDF
Building useful models for imbalanced datasets (without resampling)
Greg Landrum
 
PDF
Building useful models for imbalanced datasets (without resampling)
Greg Landrum
 
PDF
Is one enough? Data warehousing for biomedical research
Greg Landrum
 
PDF
Large scale classification of chemical reactions from patent data
Greg Landrum
 
PDF
Machine learning in the life sciences with knime
Greg Landrum
 
PDF
Open-source from/in the enterprise: the RDKit
Greg Landrum
 
PDF
Open-source tools for querying and organizing large reaction databases
Greg Landrum
 
PDF
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Greg Landrum
 
PDF
Reproducibility in cheminformatics and computational chemistry research: cert...
Greg Landrum
 
Chemical registration
Greg Landrum
 
Mike Lynch Award Lecture, ICCS 2022
Greg Landrum
 
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Greg Landrum
 
Building useful models for imbalanced datasets (without resampling)
Greg Landrum
 
Building useful models for imbalanced datasets (without resampling)
Greg Landrum
 
Is one enough? Data warehousing for biomedical research
Greg Landrum
 
Large scale classification of chemical reactions from patent data
Greg Landrum
 
Machine learning in the life sciences with knime
Greg Landrum
 
Open-source from/in the enterprise: the RDKit
Greg Landrum
 
Open-source tools for querying and organizing large reaction databases
Greg Landrum
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Greg Landrum
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Greg Landrum
 
Ad

Recently uploaded (20)

PPTX
MODULE 2 Effects of Lifestyle in the Function of Respiratory and Circulator...
judithgracemangunday
 
PDF
A proposed mechanism for the formation of protocell-like structures on Titan
Sérgio Sacani
 
PPTX
mode_of_action_of_fungicides_final[1] (2).pptx
MrRABIRANJAN
 
PPTX
Q1 - W1 - D2 - Models of matter for science.pptx
RyanCudal3
 
PDF
Phosphates reveal high pH ocean water on Enceladus
Sérgio Sacani
 
PPTX
Envenomation AND ANIMAL BITES DETAILS.pptx
HARISH543351
 
PPTX
Immunopharmaceuticals and microbial Application
xxkaira1
 
PDF
NRRM 330 Dynamic Equlibrium Presentation
Rowan Sales
 
PPTX
Anatomy and physiology of digestive system.pptx
Ashwini I Chuncha
 
PDF
A young gas giant and hidden substructures in a protoplanetary disk
Sérgio Sacani
 
PPTX
Qualification of DISSOLUTION TEST APPARATUS.pptx
shrutipandit17
 
PDF
Chemokines and Receptors Overview – Key to Immune Cell Signaling
Benjamin Lewis Lewis
 
PDF
RODENT PEST MANAGEMENT-converted-compressed.pdf
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PPTX
Diagnostic Features of Common Oral Ulcerative Lesions.pptx
Dr Palak borade
 
PDF
Carbon-richDustInjectedintotheInterstellarMediumbyGalacticWCBinaries Survives...
Sérgio Sacani
 
PPTX
Vectors and applications of genetic engineering Pptx
Ashwini I Chuncha
 
PPTX
PEDIA IDS IN A GIST_6488b6b5-3152-4a4a-a943-20a56efddd43 (2).pptx
tdas83504
 
PDF
GK_GS One Liner For Competitive Exam.pdf
abhi01nm
 
PDF
Annual report 2024 - Inria - English version.pdf
Inria
 
PDF
2025-06-10 TWDB Agency Updates & Legislative Outcomes
tagdpa
 
MODULE 2 Effects of Lifestyle in the Function of Respiratory and Circulator...
judithgracemangunday
 
A proposed mechanism for the formation of protocell-like structures on Titan
Sérgio Sacani
 
mode_of_action_of_fungicides_final[1] (2).pptx
MrRABIRANJAN
 
Q1 - W1 - D2 - Models of matter for science.pptx
RyanCudal3
 
Phosphates reveal high pH ocean water on Enceladus
Sérgio Sacani
 
Envenomation AND ANIMAL BITES DETAILS.pptx
HARISH543351
 
Immunopharmaceuticals and microbial Application
xxkaira1
 
NRRM 330 Dynamic Equlibrium Presentation
Rowan Sales
 
Anatomy and physiology of digestive system.pptx
Ashwini I Chuncha
 
A young gas giant and hidden substructures in a protoplanetary disk
Sérgio Sacani
 
Qualification of DISSOLUTION TEST APPARATUS.pptx
shrutipandit17
 
Chemokines and Receptors Overview – Key to Immune Cell Signaling
Benjamin Lewis Lewis
 
RODENT PEST MANAGEMENT-converted-compressed.pdf
S.B.P.G. COLLEGE BARAGAON VARANASI
 
Diagnostic Features of Common Oral Ulcerative Lesions.pptx
Dr Palak borade
 
Carbon-richDustInjectedintotheInterstellarMediumbyGalacticWCBinaries Survives...
Sérgio Sacani
 
Vectors and applications of genetic engineering Pptx
Ashwini I Chuncha
 
PEDIA IDS IN A GIST_6488b6b5-3152-4a4a-a943-20a56efddd43 (2).pptx
tdas83504
 
GK_GS One Liner For Competitive Exam.pdf
abhi01nm
 
Annual report 2024 - Inria - English version.pdf
Inria
 
2025-06-10 TWDB Agency Updates & Legislative Outcomes
tagdpa
 

Let’s talk about reproducible data analysis

  • 1. © 2018 KNIME AG. All Rights Reserved. Interactive and reproducible data analysis with the open-source KNIME Analytics Platform Greg Landrum, Ph.D. KNIME AG @dr_greg_landrum Frankfurt Data Science Meetup 18 June 2018
  • 2. © 2018 KNIME AG. All Rights Reserved. Let’s talk about reproducible data analysis Greg Landrum, Ph.D. KNIME AG @dr_greg_landrum Frankfurt Data Science Meetup 18 June 2018
  • 3. © 2018 KNIME AG. All Rights Reserved. 3 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility
  • 4. © 2018 KNIME AG. All Rights Reserved. 4 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility • The need to use multiple tools and multiple data sources • Collaboration between users with different sophistication levels • Deployment • Just staying organized
  • 5. © 2018 KNIME AG. All Rights Reserved. 5 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility • The need to use multiple tools and multiple data sources • Collaboration between users with different sophistication levels • Deployment • Just staying organized I think workflows can help with these
  • 6. © 2018 KNIME AG. All Rights Reserved. 6 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility • The need to use multiple tools and multiple data sources • Collaboration between users with different sophistication levels • Deployment • Just staying organized I think KNIME can help with these
  • 7. © 2018 KNIME AG. All Rights Reserved. 7 The Open Source KNIME® Analytics Platform
  • 8. © 2018 KNIME AG. All Rights Reserved. 8 Visual KNIME Workflows Nodes perform tasks on data Workflows combine nodes to model data flow Status Input(s) Outputs Not Configured Idle Executed Error
  • 9. © 2018 KNIME AG. All Rights Reserved. 9 Workflows capture parameters and data https://www.knime.com/blog/stuck-in-the-nine-circles-of-hell-try-parameter-optimization-a-cup-of-tea
  • 10. © 2018 KNIME AG. All Rights Reserved. 10 Workflows capture parameters and data https://www.knime.com/blog/stuck-in-the-nine-circles-of-hell-try-parameter-optimization-a-cup-of-tea
  • 11. © 2018 KNIME AG. All Rights Reserved. 11 Workflows capture parameters and data https://www.knime.com/blog/stuck-in-the-nine-circles-of-hell-try-parameter-optimization-a-cup-of-tea
  • 12. © 2018 KNIME AG. All Rights Reserved. 12 Workflows capture parameters and data https://www.knime.com/blog/stuck-in-the-nine-circles-of-hell-try-parameter-optimization-a-cup-of-tea
  • 13. © 2018 KNIME AG. All Rights Reserved. 13 Analysis & Mining Statistics, Machine Learning, Data Mining, Web Analytics, Text Mining, Network Analysis, Social Media Analysis, R, Weka, Python, Community / 3rd party, ... Data Access MySQL, Oracle, ... SAS, SPSS, ... Excel, Flat, ... Hive, Impala, ... XML, JSON, PMML Text, Doc, Image, ... Web Crawlers, Industry Specific, Community / 3rd party ... Transformation Row, Column, Matrix Text, Image, Networks, Time Series, Java, Python, Community / 3rd party, ... Visualization R, Python, JFreeChart, JavaScript, Community / 3rd party, ... Deployment via BIRT PMML, XML, JSON Databases, Excel, Flat, etc. Text, Doc, Image Industry Specific Community / 3rd party, ... Over 2000 native and embedded open-source nodes : Big Data Hive, Impala, HDFS Vertica, Teradata/Aster, Spark, MLlib, Community / 3rd party, ...
  • 14. © 2018 KNIME AG. All Rights Reserved. 14 Some supported data types • Numbers, strings, etc • Bit/count vectors • Images • Documents • Networks • Chemistry types • Biology types
  • 15. © 2018 KNIME AG. All Rights Reserved. 15 Free E-Learning Course: Web Page 15 • Hands-on e-learning course • Data Access, ETL, Analytics, Control Structures, Visualization • Around 50 small units • … with exercises • … and with solutions on the EXAMPLES server • Final exercises to test your knowledge! https://www.knime.org/knime- introductory-course
  • 16. © 2018 KNIME AG. All Rights Reserved. 16 KNIME Beginner’s Luck Book Free Copy of KNIME Beginner’s Luck Book at KNIME Press https://www.knime.org/knimepress Promotion Code: FRANKFURT-MEETUP-0618 Valid until 31 August
  • 17. © 2018 KNIME AG. All Rights Reserved. 17 The KNIME Software Ecosystem KNIME Analytics Platform KNIME Supported Extensions KNIME Extensions Partner Extensions Community Extensions KNIME Server
  • 18. © 2018 KNIME AG. All Rights Reserved. 18 KNIME Server Shared Repositories Access Management Web Enablement Flexible Execution
  • 19. © 2018 KNIME AG. All Rights Reserved. 19 KNIME, the company • KNIME AG founded in 2008 • Offices in Zürich (HQ), Konstanz, Berlin, and Austin • 50+ employees • Maintainer of the Open Source KNIME Analytics Platform – comprehensive data loading, processing, analysis, modeling platform – visual frontend – open: to all sorts of data, other tools (R and Python, etc.), various user personas – 20+ open source releases since 2006 – open source. • KNIME Server – 14 commercial product releases since 2008 • KNIME cloud offerings
  • 20. © 2018 KNIME AG. All Rights Reserved. 20 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility • The need to use multiple tools and multiple data sources • Collaboration between users with different sophistication levels • Deployment • Just staying organized
  • 21. © 2018 KNIME AG. All Rights Reserved. 21 Interactive data analysis and modeling • Fairly often the whole process of data preprocessing, analysis, and modeling can’t be (or shouldn’t be) fully automated. • We want/need a human in the loop • Would be lovely if this weren’t painful Interactive
  • 22. © 2018 KNIME AG. All Rights Reserved. 22 Repeatability and reproducibility • I can reproduce what I did before or repeat the same process with a different data set/method • You can do the same thing • Not necessarily talking about strict reproducibility (out to the 15th decimal place), but if we miss that we should be able discover where deviations come from • Would be lovely if this weren’t painful Reproducible
  • 23. © 2018 KNIME AG. All Rights Reserved. 23 The need to use multiple tools and multiple data sources • There is no one-size-fits-all solution (or “one-stop shop”) • We’re inevitably going to be using more than one piece of software and working with data from more than one source. • Would be lovely if this weren’t painful Open
  • 24. © 2018 KNIME AG. All Rights Reserved. 24 Collaboration between users with different sophistication levels • Some personae:1 – The scripter/programmer: “I’ve got this great new method you should try” – The tool user: “I’ll use software, but there’s no way I’m writing code” – The “stakeholder”: “Those folks are doing useful stuff and I need their results, but I don’t have time to learn some complex new piece of software.” • Would be lovely if enabling collaboration between these different personae wasn’t painful 1 Yes, these are stereotypes Collaborative
  • 25. © 2018 KNIME AG. All Rights Reserved. 25 Deployment • Once I’ve built something I’d like to make it available to my colleagues – Sharing models – Sharing methods – Sharing results • Would be lovely if this weren’t painful Deployable
  • 26. © 2018 KNIME AG. All Rights Reserved. 26 Just staying organized • I can usually remember where my scripts are • There’s no way I can remember where yours are • It would be lovely if it weren’t painful to find stuff Findable
  • 27. © 2018 KNIME AG. All Rights Reserved. 27 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility • The need to use multiple tools and multiple data sources • Collaboration between users with different sophistication levels • Deployment • Just staying organized
  • 28. © 2018 KNIME AG. All Rights Reserved. 28 Some data analysis problems • Interactive data analysis and modeling • Repeatability and reproducibility • The need to use multiple tools and multiple data sources • Collaboration between users with different sophistication levels • Deployment • Just staying organized I think KNIME can help with these
  • 29. 29© 2018 KNIME AG. All Rights Reserved. Thanks! dr_greg_landrum [email protected]
  • 30. 30© 2018 KNIME AG. All Rights Reserved. The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany.