SlideShare a Scribd company logo
designed by tinyPPT.com
Karthik Chinnusamy
Sr Principal,Data Architect
Veritas Tech
SFBayACM Presentation
Wednesday, January 15, 2020
7:00 PM to 9:00 PM
Palo Alto Networks Building 3
3200 Tannery Way · Santa Clara, ca
Big Data
Modeling and
Machine
Learning with
no code
designed by tinyPPT.com
BIG DATA
INTRODUCTION
 Machine
 People
 Organization
SOURCES
 Integrating different types of
data sources
Reduce Complexity
Increase data availability
Data collaboration
VALUE
 Volume
 Velocity
 Veracity
CHARACTERSTICS
 Big data
 Insight
 Action => Product
DATA SCIENCE
DATA MANAGEMENT
DATA
GOVERNANCE
 Variety
 Valence
designed by tinyPPT.com
BIG DATA
PROCESSING
 Identify Data
 Retrieve Data
 Query Data
ACQUIRE
 Explore
 Pre-process
PREPARE
 Select analytical
techniques
 Build models
ANALYZE
 Communicate
results
REPORT
 Apply results
ACT
designed by tinyPPT.com
ACQUIRE
PREPARE
ANALYZEREPORT
ACT
Iterative
Process
Enable
Scalability
Provide a value
Facilitate share
environment
Optimized for
variety of data
Handle fault
tolerance Scalable
Tools
designed by tinyPPT.com
Big data eco-system
designed by tinyPPT.com
designed by tinyPPT.com
 Create a nice
 Easy-to-navigate
 Easy-to-understand
 Easy-to-maintain
 Easy-to-document
 Easy-to-consume architectural picture of these application and data chaos?
 How do they do so when faced with time pressure to comply with demanding
regulations like GDPR, BCBS 239, CMS, and others?
 What’s the starting point?
 Do you start at the end with the reports?
 But which reports first?
Data Governance & Information Architect Challenges
 Where can I find them?
 Who owns them?
 Who maintains them?
 Who can help me explain them?
 Are they still up-to-date?
designed by tinyPPT.com
designed by tinyPPT.com
designed by tinyPPT.com
Machine Learning
with KNIME
KNIME Open source
Analytic Platform
Server
GUI Based
No Coding
Experience required
Extensions
& Community
Extensions
ADVANCED
VISUALIZATION
Extensive
Integration
K1
Slide 10
K1 Karthik-ASUS, 9/11/2018
designed by tinyPPT.com
KNIME
• KNIME founded in 2008
• KoNstanz Information MinEr(KNIME)
• KNIME AG located in Zurich and the group of Michael Berthold at the
University of Konstanz, Chair for Bioinformatics and Information Mining.
• Interactive Preprocessing, Analysis and Modelling
• Repeatable and reproduceable
• Integration with Python,Java, R &SPARK
• Integration with AWS and AZURE
• Integration with Weka,Plotly,Keras,H2O
• Waikato Environment for Knowledge Analysis, developed at the University of
Waikato, New Zealand
• Plotly is a technical computing company headquartered in Montreal, Quebec,
that develops online data analytics and visualization tools.
• Keras is an open-source neural-network library written in Python. Run top of
TensorFlow, Microsoft Cognitive Toolkit, R, Theano, or PlaidML.
• H2O, the leading open source data science and machine learning platform
trusted by over 18,000 organizations and 200,000 users around the world.
designed by tinyPPT.com
KNIME Overview
designed by tinyPPT.com
KNIME Nodes and
Status
designed by tinyPPT.com
Demo
designed by tinyPPT.com
Orchestration & Workflow Oozie, ODE, Airavata and OODT (Tools)
NA: Pegasus, Kepler, Swift, Taverna, Trident, ActiveBPEL, BioKepler, Galaxy
Data Analytics Libraries:
Machine Learning
Mahout , MLlib , MLbase
CompLearn (NA)
Linear Algebra
Scalapack, PetSc (NA)
Statistics, Bioinformatics
R, Bioconductor (NA)
Imagery
ImageJ (NA)
MRQL
(SQL on Hadoop,
Hama, Spark)
Hive
(SQL on
Hadoop)
Pig
(Procedural
Language)
Shark
(SQL on
Spark, NA)
Hcatalog
Interfaces
Impala (NA)
Cloudera
(SQL on Hbase)
Swazall
(Log Files
Google NA)
High Level (Integrated) Systems for Data Processing
Parallel Horizontally Scalable Data Processing
Giraph
~Pregel
Tez
(DAG)
Spark
(Iterative
MR)
Storm
S4
Yahoo
Samza
LinkedIn
Hama
(BSP)
Hadoop
(Map
Reduce)
Pegasus
on Hadoop
(NA)
NA:Twister
Stratosphere
Iterative MR
GraphBatch Stream
Pub/Sub Messaging Netty (NA)/ZeroMQ (NA)/ActiveMQ/Qpid/Kafka
ABDS Inter-process Communication
Hadoop, Spark Communications MPI (NA)
& Reductions Harp Collectives (NA)
HPC Inter-process Communication
Cross Cutting
Capabilities
DistributedCoordination:ZooKeeper,JGroups
MessageProtocols:Thrift,Protobuf(NA)
Security&Privacy
Monitoring:Ambari,Ganglia,Nagios,Inca(NA)
designed by tinyPPT.com
In memory distributed databases/caches: GORA (general object from NoSQL), Memcached
(NA), Redis(NA) (key value), Hazelcast (NA), Ehcache (NA);
Mesos, Yarn, Helix, Llama(Cloudera) Condor, Moab, Slurm, Torque(NA) ……..
ABDS Cluster Resource Management HPC Cluster Resource Management
ABDS File Systems User Level HPC File Systems (NA)
HDFS, Swift, Ceph FUSE(NA) Gluster, Lustre, GPFS, GFFS
Object Stores POSIX Interface Distributed, Parallel, Federated
iRODS(NA)
Interoperability Layer Whirr / JClouds OCCI CDMI (NA)
DevOps/Cloud Deployment Puppet/Chef/Boto/CloudMesh(NA)
Cross Cutting
Capabilities
DistributedCoordination:ZooKeeper,JGroups
MessageProtocols:Thrift,Protobuf(NA)
Security&Privacy
Monitoring:Ambari,Ganglia,Nagios,Inca(NA)
SQL
MySQL
(NA)
SciDB
(NA)
Arrays,
R,Python
Phoenix
(SQL on
HBase)
UIMA
(Entities)
(Watson)
Tika
(Content)
Extraction Tools
Cassandra
(DHT)
NoSQL: Column
HBase
(Data on
HDFS)
Accumulo
(Data on
HDFS)
Solandra
(Solr+
Cassandra)
+Document
Azure
Table
NoSQL: Document
MongoDB
(NA)
CouchDB Lucene
Solr
Riak
~Dynamo
NoSQL: Key Value (all NA)
Dynamo
Amazon
Voldemort
~Dynamo
Berkeley
DB
Neo4J
Java Gnu
(NA)
NoSQL: General Graph
RYA RDF on
Accumulo
NoSQL: TripleStore RDF SparkQL
AllegroGraph
Commercial
Sesame
(NA)
Yarcdata
Commercial
(NA)
Jena
ORM Object Relational Mapping: Hibernate(NA), OpenJPA and JDBC Standard
File
Management
IaaS System Manager Open Source Commercial Clouds
OpenStack, OpenNebula, Eucalyptus, CloudStack, vCloud, Amazon, Azure, Google
Bare
Metal
Data Transport BitTorrent, HTTP, FTP, SSH Globus Online (GridFTP)
designed by tinyPPT.com
KNIME
•Open source KNIME Analytics Platform for creating data
science
•Commercial KNIME Server for productionizing data
science.
KNIME Analytics Platform is the open source software for
creating data science. Intuitive, open, and continuously
integrating new developments, KNIME makes
understanding data and designing data science
workflows and reusable components accessible to
everyone.
KNIME Server is the enterprise software for team-based
collaboration, automation, management, and deployment
of data science workflows as analytical applications and
services. Non experts are given access to data science
via KNIME WebPortal or can use REST APIs
Extensions for KNIME Analytics Platform
designed by tinyPPT.com
Demo
designed by tinyPPT.com
Demo
designed by tinyPPT.com
Demo
KNIME Software in the Cloud
KNIME Analytics Platform and KNIME Server are
available on Microsoft Azure and Amazon AWS
designed by tinyPPT.com
KNIME Workbench
designed by tinyPPT.com
Business Usecase
designed by tinyPPT.com
Acknowledgement &
Additional Info
• www.knime.com
• https://www.knime.com/resources
designed by tinyPPT.com
Q&A
designed by tinyPPT.com
Appendix
designed by tinyPPT.com
KNIME Cheat Sheet

More Related Content

What's hot (19)

PPTX
Scaling collaborative data science with Globus and Jupyter
Ian Foster
 
PPTX
Big Process for Big Data @ PNNL, May 2013
Ian Foster
 
PPTX
Cloud com foster december 2010
Ian Foster
 
PDF
Implementing a VO archive for datacubes of galaxies
Jose Enrique Ruiz
 
PPTX
Research Automation for Data-Driven Discovery
Globus
 
PDF
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Spark Summit
 
PDF
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 
PDF
Digital Science: Reproducibility and Visibility in Astronomy
Jose Enrique Ruiz
 
PPTX
51 Use Cases and implications for HPC & Apache Big Data Stack
Geoffrey Fox
 
PPTX
Virtual Science in the Cloud
thetfoot
 
PDF
IPython Notebooks - Hacia los papers ejecutables
Jose Enrique Ruiz
 
PPTX
Data Tribology: Overcoming Data Friction with Cloud Automation
Ian Foster
 
PDF
Big Data with Modern R & Spark
Xavier de Pedro
 
PDF
Accelerating your research with Microsoft Azure
Microsoft Azure for Research
 
PDF
Workflows in the Virtual Observatory
Jose Enrique Ruiz
 
PPTX
Data Automation at Light Sources
Ian Foster
 
PDF
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
inside-BigData.com
 
PDF
ieee cloud 2015 keynote talk
Microsoft Azure for Research
 
PDF
Astronomical Data Processing on the LSST Scale with Apache Spark
Databricks
 
Scaling collaborative data science with Globus and Jupyter
Ian Foster
 
Big Process for Big Data @ PNNL, May 2013
Ian Foster
 
Cloud com foster december 2010
Ian Foster
 
Implementing a VO archive for datacubes of galaxies
Jose Enrique Ruiz
 
Research Automation for Data-Driven Discovery
Globus
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Spark Summit
 
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 
Digital Science: Reproducibility and Visibility in Astronomy
Jose Enrique Ruiz
 
51 Use Cases and implications for HPC & Apache Big Data Stack
Geoffrey Fox
 
Virtual Science in the Cloud
thetfoot
 
IPython Notebooks - Hacia los papers ejecutables
Jose Enrique Ruiz
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Ian Foster
 
Big Data with Modern R & Spark
Xavier de Pedro
 
Accelerating your research with Microsoft Azure
Microsoft Azure for Research
 
Workflows in the Virtual Observatory
Jose Enrique Ruiz
 
Data Automation at Light Sources
Ian Foster
 
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
inside-BigData.com
 
ieee cloud 2015 keynote talk
Microsoft Azure for Research
 
Astronomical Data Processing on the LSST Scale with Apache Spark
Databricks
 

Similar to Big Data Modeling Challenges and Machine Learning with No Code (20)

PPTX
Building a Big Data Pipeline
Jesus Rodriguez
 
PPTX
Cisco event 6 05 2014v3 wwt only
Arthur_Hansen
 
PDF
Data-Ed: A Framework for no sql and Hadoop
Data Blueprint
 
PDF
Data-Ed Webinar: A Framework for Implementing NoSQL, Hadoop
DATAVERSITY
 
PPTX
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Omid Vahdaty
 
PPTX
Big Data Open Source Technologies
neeraj rathore
 
PDF
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
PPTX
Introduction to Data Engineering
Durga Gadiraju
 
PDF
Big Data and Implications on Platform Architecture
Odinot Stanislas
 
PDF
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
PDF
LUISS - Deep Learning and data analyses - 09/01/19
Alberto Paro
 
PPTX
Big data4businessusers
Bob Hardaway
 
PPTX
Big data analytics and machine intelligence v5.0
Amr Kamel Deklel
 
PDF
Big Data Analytics M1.pdf big data analytics
nithishlkumar9194
 
PDF
Big data/Hadoop/HANA Basics
Global Business Solutions SME
 
PPTX
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"
Ibrahim Muhammadi
 
DOCX
Big data (word file)
Shahbaz Anjam
 
PPT
Hadoop India Summit, Feb 2011 - Informatica
Sanjeev Kumar
 
PPT
Introduction to Big Data An analogy between Sugar Cane & Big Data
Jean-Marc Desvaux
 
PDF
20160331 sa introduction to big data pipelining berlin meetup 0.3
Simon Ambridge
 
Building a Big Data Pipeline
Jesus Rodriguez
 
Cisco event 6 05 2014v3 wwt only
Arthur_Hansen
 
Data-Ed: A Framework for no sql and Hadoop
Data Blueprint
 
Data-Ed Webinar: A Framework for Implementing NoSQL, Hadoop
DATAVERSITY
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Omid Vahdaty
 
Big Data Open Source Technologies
neeraj rathore
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
Introduction to Data Engineering
Durga Gadiraju
 
Big Data and Implications on Platform Architecture
Odinot Stanislas
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
LUISS - Deep Learning and data analyses - 09/01/19
Alberto Paro
 
Big data4businessusers
Bob Hardaway
 
Big data analytics and machine intelligence v5.0
Amr Kamel Deklel
 
Big Data Analytics M1.pdf big data analytics
nithishlkumar9194
 
Big data/Hadoop/HANA Basics
Global Business Solutions SME
 
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"
Ibrahim Muhammadi
 
Big data (word file)
Shahbaz Anjam
 
Hadoop India Summit, Feb 2011 - Informatica
Sanjeev Kumar
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Jean-Marc Desvaux
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
Simon Ambridge
 
Ad

Recently uploaded (20)

PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PDF
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
Français Patch Tuesday - Juillet
Ivanti
 
PDF
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
PDF
Rethinking Security Operations - SOC Evolution Journey.pdf
Haris Chughtai
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PPTX
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
Français Patch Tuesday - Juillet
Ivanti
 
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
Rethinking Security Operations - SOC Evolution Journey.pdf
Haris Chughtai
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
Ad

Big Data Modeling Challenges and Machine Learning with No Code

  • 1. designed by tinyPPT.com Karthik Chinnusamy Sr Principal,Data Architect Veritas Tech SFBayACM Presentation Wednesday, January 15, 2020 7:00 PM to 9:00 PM Palo Alto Networks Building 3 3200 Tannery Way · Santa Clara, ca Big Data Modeling and Machine Learning with no code
  • 2. designed by tinyPPT.com BIG DATA INTRODUCTION  Machine  People  Organization SOURCES  Integrating different types of data sources Reduce Complexity Increase data availability Data collaboration VALUE  Volume  Velocity  Veracity CHARACTERSTICS  Big data  Insight  Action => Product DATA SCIENCE DATA MANAGEMENT DATA GOVERNANCE  Variety  Valence
  • 3. designed by tinyPPT.com BIG DATA PROCESSING  Identify Data  Retrieve Data  Query Data ACQUIRE  Explore  Pre-process PREPARE  Select analytical techniques  Build models ANALYZE  Communicate results REPORT  Apply results ACT
  • 4. designed by tinyPPT.com ACQUIRE PREPARE ANALYZEREPORT ACT Iterative Process Enable Scalability Provide a value Facilitate share environment Optimized for variety of data Handle fault tolerance Scalable Tools
  • 5. designed by tinyPPT.com Big data eco-system
  • 7. designed by tinyPPT.com  Create a nice  Easy-to-navigate  Easy-to-understand  Easy-to-maintain  Easy-to-document  Easy-to-consume architectural picture of these application and data chaos?  How do they do so when faced with time pressure to comply with demanding regulations like GDPR, BCBS 239, CMS, and others?  What’s the starting point?  Do you start at the end with the reports?  But which reports first? Data Governance & Information Architect Challenges  Where can I find them?  Who owns them?  Who maintains them?  Who can help me explain them?  Are they still up-to-date?
  • 10. designed by tinyPPT.com Machine Learning with KNIME KNIME Open source Analytic Platform Server GUI Based No Coding Experience required Extensions & Community Extensions ADVANCED VISUALIZATION Extensive Integration K1
  • 12. designed by tinyPPT.com KNIME • KNIME founded in 2008 • KoNstanz Information MinEr(KNIME) • KNIME AG located in Zurich and the group of Michael Berthold at the University of Konstanz, Chair for Bioinformatics and Information Mining. • Interactive Preprocessing, Analysis and Modelling • Repeatable and reproduceable • Integration with Python,Java, R &SPARK • Integration with AWS and AZURE • Integration with Weka,Plotly,Keras,H2O • Waikato Environment for Knowledge Analysis, developed at the University of Waikato, New Zealand • Plotly is a technical computing company headquartered in Montreal, Quebec, that develops online data analytics and visualization tools. • Keras is an open-source neural-network library written in Python. Run top of TensorFlow, Microsoft Cognitive Toolkit, R, Theano, or PlaidML. • H2O, the leading open source data science and machine learning platform trusted by over 18,000 organizations and 200,000 users around the world.
  • 14. designed by tinyPPT.com KNIME Nodes and Status
  • 16. designed by tinyPPT.com Orchestration & Workflow Oozie, ODE, Airavata and OODT (Tools) NA: Pegasus, Kepler, Swift, Taverna, Trident, ActiveBPEL, BioKepler, Galaxy Data Analytics Libraries: Machine Learning Mahout , MLlib , MLbase CompLearn (NA) Linear Algebra Scalapack, PetSc (NA) Statistics, Bioinformatics R, Bioconductor (NA) Imagery ImageJ (NA) MRQL (SQL on Hadoop, Hama, Spark) Hive (SQL on Hadoop) Pig (Procedural Language) Shark (SQL on Spark, NA) Hcatalog Interfaces Impala (NA) Cloudera (SQL on Hbase) Swazall (Log Files Google NA) High Level (Integrated) Systems for Data Processing Parallel Horizontally Scalable Data Processing Giraph ~Pregel Tez (DAG) Spark (Iterative MR) Storm S4 Yahoo Samza LinkedIn Hama (BSP) Hadoop (Map Reduce) Pegasus on Hadoop (NA) NA:Twister Stratosphere Iterative MR GraphBatch Stream Pub/Sub Messaging Netty (NA)/ZeroMQ (NA)/ActiveMQ/Qpid/Kafka ABDS Inter-process Communication Hadoop, Spark Communications MPI (NA) & Reductions Harp Collectives (NA) HPC Inter-process Communication Cross Cutting Capabilities DistributedCoordination:ZooKeeper,JGroups MessageProtocols:Thrift,Protobuf(NA) Security&Privacy Monitoring:Ambari,Ganglia,Nagios,Inca(NA)
  • 17. designed by tinyPPT.com In memory distributed databases/caches: GORA (general object from NoSQL), Memcached (NA), Redis(NA) (key value), Hazelcast (NA), Ehcache (NA); Mesos, Yarn, Helix, Llama(Cloudera) Condor, Moab, Slurm, Torque(NA) …….. ABDS Cluster Resource Management HPC Cluster Resource Management ABDS File Systems User Level HPC File Systems (NA) HDFS, Swift, Ceph FUSE(NA) Gluster, Lustre, GPFS, GFFS Object Stores POSIX Interface Distributed, Parallel, Federated iRODS(NA) Interoperability Layer Whirr / JClouds OCCI CDMI (NA) DevOps/Cloud Deployment Puppet/Chef/Boto/CloudMesh(NA) Cross Cutting Capabilities DistributedCoordination:ZooKeeper,JGroups MessageProtocols:Thrift,Protobuf(NA) Security&Privacy Monitoring:Ambari,Ganglia,Nagios,Inca(NA) SQL MySQL (NA) SciDB (NA) Arrays, R,Python Phoenix (SQL on HBase) UIMA (Entities) (Watson) Tika (Content) Extraction Tools Cassandra (DHT) NoSQL: Column HBase (Data on HDFS) Accumulo (Data on HDFS) Solandra (Solr+ Cassandra) +Document Azure Table NoSQL: Document MongoDB (NA) CouchDB Lucene Solr Riak ~Dynamo NoSQL: Key Value (all NA) Dynamo Amazon Voldemort ~Dynamo Berkeley DB Neo4J Java Gnu (NA) NoSQL: General Graph RYA RDF on Accumulo NoSQL: TripleStore RDF SparkQL AllegroGraph Commercial Sesame (NA) Yarcdata Commercial (NA) Jena ORM Object Relational Mapping: Hibernate(NA), OpenJPA and JDBC Standard File Management IaaS System Manager Open Source Commercial Clouds OpenStack, OpenNebula, Eucalyptus, CloudStack, vCloud, Amazon, Azure, Google Bare Metal Data Transport BitTorrent, HTTP, FTP, SSH Globus Online (GridFTP)
  • 18. designed by tinyPPT.com KNIME •Open source KNIME Analytics Platform for creating data science •Commercial KNIME Server for productionizing data science. KNIME Analytics Platform is the open source software for creating data science. Intuitive, open, and continuously integrating new developments, KNIME makes understanding data and designing data science workflows and reusable components accessible to everyone. KNIME Server is the enterprise software for team-based collaboration, automation, management, and deployment of data science workflows as analytical applications and services. Non experts are given access to data science via KNIME WebPortal or can use REST APIs Extensions for KNIME Analytics Platform
  • 21. designed by tinyPPT.com Demo KNIME Software in the Cloud KNIME Analytics Platform and KNIME Server are available on Microsoft Azure and Amazon AWS
  • 24. designed by tinyPPT.com Acknowledgement & Additional Info • www.knime.com • https://www.knime.com/resources