SlideShare a Scribd company logo
Use Cases to Build & Deploy in < 30 min
Self-Serve Big Data Analytics & Applications
2
Agenda
Introduction
Sparkflows Solution
Use Cases
3
100 + Building Blocks
ETL, ML, OCR, NLP, Connect to various
Sources/Sinks
Workflow Editor
Powerful Schema Inference, Schema Propagation,
Interactive Execution
Visualization & DashboardsPrebuilt Workflows
Introduction
4
Workflow Editor
Sparkflows Solution
Rich Visualizations &
Dashboards
100’s of Pre-
built Nodes
Batch & Streaming
Engine
Interactive Execution
Easy Deployment &
Configuration
Pre-built Workflows
Telco Churn Pred
Housing Price Pred
Bike Sharing Analysis
NY Taxi Data Analysis
Movie Lens
Recommendations
5
Sparkflows Product Stack
Streaming
Data
Kafka
Flume
Data
Sources
HIVE/HBase
HDFS/S3
Solr
RDBMS
Apache Spark Cluster
Databricks AWS
IBM
Bluemix
On
Prem
Azur
e
Data Sinks
HIVE/HBase
HDFS/S3
Solr
RDBMS
Visualizations
/ Dashboards
6
Machine Learning
Classification
Regression
Clustering
Collaborative Filtering
Save/Load Model
Predict
Cross-Validator
NLP
NER
Sentiment
OCR
Tesseract
Visualization
Line Chart
Bar Chart
Pie Chart
Updating Dashboards
File Formats
CSV/TSV
Parquet
JSON
Avro
PDF
Images
Whole Files
Feature
Generation
Tokenization
TF, IDF
OneHotEncoder
StringIndexer
Imputer
Scaler
Data Sources/Sinks
HDFS
S3
Kafka, Flume, Twitter
HBase
Solr
Elastic Search
ETL
Joins, Unions
Filter
SQL, Scala, Python
GeoIP
ConcatColumns
Column Filter
Dedup
Languages
SQL
Scala
Jython
Java
Some of the Building Block / Nodes
7
Use Cases in < 30 minutes
Self-Serve Big Data Analytics
ETL Pipelines
NLP
OCR
Streaming Analytics
Do Big Data Analytics with Drag & Drop with 100+ building blocks
Build ETL pipelines with ease. Also incorporate SQL, Scala, Jython in it.
Perform NLP on Big Data with OpenNLP and Stanford CoreNLP
Perform OCR on millions of images with Tesseract
Perform Streaming Analytics reading from Kafka, performing complex
transforms, generate graphs and write out to Solr, Hbase etc.
8
Use Cases in < 30 minutes
Machine Learning
Entity Resolution
Log Analytics
Format Conversion
Load data into Solr, ES,
HBase
Perform Machine Learning on huge datasets with drag and drop
Perform large scale Entity Resolution on data from multiple channels
Build Log Analytics Platform with Kafka, Spark, Solr/Elastic Search, Hue
Convert Big Data from one format to another
Easily load data into Solr, Elastic Search, HBase etc.
9
Use Cases in < 30 minutes
Custom Nodes Create Custom Nodes and drop them in the Library/Workflow Editor
Dashboards Combine various outputs of workflows into a Dashboard
Self-Serve Data Analytics
Spark
CSV
Read
AVRO
Save
JSON
Parquet
Solr
HBase
Elastic
Search
HIVE
Row Filter /
Rename Col
Random
Forest
SQL / Scala / Jython
JOIN
Read
Graph
Graph
Model
Dashboard
ETL – Build ETL pipelines with ease
HIVE
Solr
Spark
CSV Filter
Filter
JOIN SQL
ES
HBase
HIVE
LoadSolr
LoadES
LoadHBase
LoadHIVE
ReadCSV
ReadHIVE
ETL – Connect various SQL for powerful pipelines
HIVE
Solr
Spark
CSV SQL
SQL
SQL SQL
ES
HBase
HIVE
LoadSolr
LoadES
LoadHBase
LoadHIVE
ReadCSV
ReadHIVE
NLP – Perform distributed NLP on Big Data
CSV
Solr
Spark
PDF NLP
NLP
JOIN
ES
HBase
HIVE
LoadSolr
LoadES
LoadHBase
LoadHIVE
ReadPDF
ReadCSV
OCR – Perform distributed OCR on Big Data
Solr
Spark
PDF OCR
ES
HBase
HIVE
LoadSolr
LoadES
LoadHBase
LoadHIVE
ReadPDF
Plus extract
images
Streaming Analytics – With Kafka & Spark Streaming
Solr
Spark
ES
HBase
HIVE
LoadSolr
LoadES
LoadHBase
LoadHIVE
ReadKafka
Apply
various
transforms
K
a
f
k
a
Transform
Graph
Machine Learning – With Spark ML
Spark
Logistic
Regression
Score
Evaluate
Apply
various
transforms
TransformHIVE Split
Entity Resolution – Applying various distance algorithms & scoring
Spark
Dedup
Join &
Transform
DataSet 1
DataSet 2
HIVE
Filter low
Scores
Log Analytics
Spark
IP2Geo
ReadKafka
K
a
f
k
a
Graph
Apache
Logs
Parse Apache
Logs
Save
Solr
HBase
Elastic
Search
HIVE
SQL
HUE
Small Files Problem
CSV
Spark
CSV
Coalesce
HIVE
Read
HIVE
Save
Format Conversion
Spark
CSV
Read
AVRO
Save
JSON
Parquet
CSV
AVRO
JSON
Parquet
Loading Data into Solr, Elastic Search, HBase, HIVE
Spark
CSV
Read
AVRO
Save
JSON
Parquet
Solr
HBase
Elastic
Search
HIVE
Custom Nodes – Create & Use Custom Nodes which add custom features
Spark
Custom Node
Join &
Transform
DataSet 1
DataSet 2
HIVECustom Node
Dashboards – Combine output of various Workflows/Nodes into a Dashboard
24
THANK YOU

More Related Content

PPTX
Unifying your data management with Hadoop
Jayant Shekhar
 
PPTX
Sparkflows.io
sparkflows
 
PPTX
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
sparkflows
 
PDF
SparkFlow
Takede Madiga Albert
 
PPTX
Building Advanced Analytics Pipelines with Azure Databricks
Lace Lofranco
 
PPTX
Azure Data Factory Data Flows Training v005
Mark Kromer
 
PPTX
Mapping Data Flows Training April 2021
Mark Kromer
 
PPTX
Azure Data Factory Data Flows Training (Sept 2020 Update)
Mark Kromer
 
Unifying your data management with Hadoop
Jayant Shekhar
 
Sparkflows.io
sparkflows
 
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
sparkflows
 
Building Advanced Analytics Pipelines with Azure Databricks
Lace Lofranco
 
Azure Data Factory Data Flows Training v005
Mark Kromer
 
Mapping Data Flows Training April 2021
Mark Kromer
 
Azure Data Factory Data Flows Training (Sept 2020 Update)
Mark Kromer
 

What's hot (20)

PPTX
Azure Data Factory Data Wrangling with Power Query
Mark Kromer
 
PDF
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Spark Summit
 
PDF
An Introduction to Sparkling Water by Michal Malohlava
Spark Summit
 
PPTX
Data Quality Patterns in the Cloud with Azure Data Factory
Mark Kromer
 
PDF
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Lex Avstreikh
 
PPTX
ADF Mapping Data Flows Training V2
Mark Kromer
 
PDF
Scalable And Incremental Data Profiling With Spark
Jen Aman
 
PPTX
Presto: Distributed sql query engine
kiran palaka
 
PDF
ETL Made Easy with Azure Data Factory and Azure Databricks
Databricks
 
PDF
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
Databricks
 
PDF
From Batch to Streaming ET(L) with Apache Apex
DataWorks Summit
 
PPTX
Building Data Pipelines with Spark and StreamSets
Pat Patterson
 
PDF
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Spark Summit
 
PDF
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Databricks
 
PPTX
Data ingestion
nitheeshe2
 
PPTX
Open Source Big Data Ingestion - Without the Heartburn!
Pat Patterson
 
PDF
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
 
PDF
Hyperspace for Delta Lake
Databricks
 
PDF
Spark and Bloomberg by Sudarshan Kadambi and Partha Nageswaran
Spark Summit
 
PDF
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
Databricks
 
Azure Data Factory Data Wrangling with Power Query
Mark Kromer
 
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Spark Summit
 
An Introduction to Sparkling Water by Michal Malohlava
Spark Summit
 
Data Quality Patterns in the Cloud with Azure Data Factory
Mark Kromer
 
Dowling buso-feature-store-logical-clocks-spark-ai-summit-2020.pptx
Lex Avstreikh
 
ADF Mapping Data Flows Training V2
Mark Kromer
 
Scalable And Incremental Data Profiling With Spark
Jen Aman
 
Presto: Distributed sql query engine
kiran palaka
 
ETL Made Easy with Azure Data Factory and Azure Databricks
Databricks
 
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
Databricks
 
From Batch to Streaming ET(L) with Apache Apex
DataWorks Summit
 
Building Data Pipelines with Spark and StreamSets
Pat Patterson
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Spark Summit
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Databricks
 
Data ingestion
nitheeshe2
 
Open Source Big Data Ingestion - Without the Heartburn!
Pat Patterson
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
 
Hyperspace for Delta Lake
Databricks
 
Spark and Bloomberg by Sudarshan Kadambi and Partha Nageswaran
Spark Summit
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
Databricks
 
Ad

Viewers also liked (20)

PDF
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Rittman Analytics
 
PDF
Future Of Data Paris - BI and Big Data
Mathias Kluba
 
PDF
Big Data simplified
Praveen Hanchinal
 
PDF
Ingesting click events for analytics
Data Driven Innovation
 
PPT
Social Media per fare analisi della concorrenza
Data Driven Innovation
 
PDF
Genomic Data Analysis
Data Driven Innovation
 
PDF
Data Driven Business Model: le opportunità di monetizzazione
Data Driven Innovation
 
PPT
Big Data & Privacy @ #Datadriven16
Data Driven Innovation
 
PDF
Data culture
Data Driven Innovation
 
PPTX
Language Translation re-invented with Big Data
Data Driven Innovation
 
PDF
BigData: una nuova fonte per la ricerca storica
Data Driven Innovation
 
PDF
Architecting big data solutions in the cloud
Mostafa
 
PDF
Codemotion fuse presentation
Ugo Landini
 
PDF
Data Driven UX - From Social networks to target audience
Data Driven Innovation
 
PPTX
Spark + HBase
DataWorks Summit/Hadoop Summit
 
PDF
Healthware for medicine - Roberto Ascione
Data Driven Innovation
 
PPTX
4th industrial revolution – impact of data on the real world
Data Driven Innovation
 
PDF
INDUSTRIA 4.0 - Il trasferimento tecnologico attraverso i Digital Innovation ...
Data Driven Innovation
 
PDF
Il valore delle Indicazioni Geografiche nell'economia italiana - Mauro Rosati
Data Driven Innovation
 
PPTX
Enhanced site search with cognitive APIs - Glynn Bird
Data Driven Innovation
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Rittman Analytics
 
Future Of Data Paris - BI and Big Data
Mathias Kluba
 
Big Data simplified
Praveen Hanchinal
 
Ingesting click events for analytics
Data Driven Innovation
 
Social Media per fare analisi della concorrenza
Data Driven Innovation
 
Genomic Data Analysis
Data Driven Innovation
 
Data Driven Business Model: le opportunità di monetizzazione
Data Driven Innovation
 
Big Data & Privacy @ #Datadriven16
Data Driven Innovation
 
Language Translation re-invented with Big Data
Data Driven Innovation
 
BigData: una nuova fonte per la ricerca storica
Data Driven Innovation
 
Architecting big data solutions in the cloud
Mostafa
 
Codemotion fuse presentation
Ugo Landini
 
Data Driven UX - From Social networks to target audience
Data Driven Innovation
 
Healthware for medicine - Roberto Ascione
Data Driven Innovation
 
4th industrial revolution – impact of data on the real world
Data Driven Innovation
 
INDUSTRIA 4.0 - Il trasferimento tecnologico attraverso i Digital Innovation ...
Data Driven Innovation
 
Il valore delle Indicazioni Geografiche nell'economia italiana - Mauro Rosati
Data Driven Innovation
 
Enhanced site search with cognitive APIs - Glynn Bird
Data Driven Innovation
 
Ad

Similar to Sparkflows Use Cases (20)

PDF
How Apache Spark fits in the Big Data landscape
Paco Nathan
 
PDF
Dev Ops Training
Spark Summit
 
PDF
What's new with Apache Spark?
Paco Nathan
 
PDF
How Apache Spark fits into the Big Data landscape
Paco Nathan
 
PDF
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
Hejwowski Piotr
 
PDF
Started with-apache-spark
Happiest Minds Technologies
 
PPTX
Big Data Introduction - Solix empower
Durga Gadiraju
 
PDF
How Apache Spark fits into the Big Data landscape
Paco Nathan
 
PDF
Simple, Modular and Extensible Big Data Platform Concept
Satish Mohan
 
PDF
Big data processing with apache spark
sarith divakar
 
PPTX
In Memory Analytics with Apache Spark
Venkata Naga Ravi
 
PDF
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
PDF
A Tool For Big Data Analysis using Apache Spark
datamantra
 
PDF
Apache Spark: The Next Gen toolset for Big Data Processing
prajods
 
PPTX
Apache Spark on HDinsight Training
Synergetics Learning and Cloud Consulting
 
PDF
Solution Brief: Big Data Lab Accelerator
BlueData, Inc.
 
PPTX
Tools and Methods for Big Data Analytics by Dahl Winters
Melinda Thielbar
 
PPTX
Tools and Methods for Big Data Analytics by Dahl Winters
Melinda Thielbar
 
PPTX
Building a Big Data Pipeline
Jesus Rodriguez
 
PDF
Introduction to Spark Training
Spark Summit
 
How Apache Spark fits in the Big Data landscape
Paco Nathan
 
Dev Ops Training
Spark Summit
 
What's new with Apache Spark?
Paco Nathan
 
How Apache Spark fits into the Big Data landscape
Paco Nathan
 
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
Hejwowski Piotr
 
Started with-apache-spark
Happiest Minds Technologies
 
Big Data Introduction - Solix empower
Durga Gadiraju
 
How Apache Spark fits into the Big Data landscape
Paco Nathan
 
Simple, Modular and Extensible Big Data Platform Concept
Satish Mohan
 
Big data processing with apache spark
sarith divakar
 
In Memory Analytics with Apache Spark
Venkata Naga Ravi
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
A Tool For Big Data Analysis using Apache Spark
datamantra
 
Apache Spark: The Next Gen toolset for Big Data Processing
prajods
 
Apache Spark on HDinsight Training
Synergetics Learning and Cloud Consulting
 
Solution Brief: Big Data Lab Accelerator
BlueData, Inc.
 
Tools and Methods for Big Data Analytics by Dahl Winters
Melinda Thielbar
 
Tools and Methods for Big Data Analytics by Dahl Winters
Melinda Thielbar
 
Building a Big Data Pipeline
Jesus Rodriguez
 
Introduction to Spark Training
Spark Summit
 

Recently uploaded (20)

PPT
1. SYSTEMS, ROLES, AND DEVELOPMENT METHODOLOGIES.ppt
zilow058
 
PDF
Construction of a Thermal Vacuum Chamber for Environment Test of Triple CubeS...
2208441
 
PDF
Machine Learning All topics Covers In This Single Slides
AmritTiwari19
 
PDF
Biodegradable Plastics: Innovations and Market Potential (www.kiu.ac.ug)
publication11
 
PPTX
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 
PDF
Chad Ayach - A Versatile Aerospace Professional
Chad Ayach
 
PPTX
Tunnel Ventilation System in Kanpur Metro
220105053
 
PPTX
Online Cab Booking and Management System.pptx
diptipaneri80
 
PPTX
quantum computing transition from classical mechanics.pptx
gvlbcy
 
PDF
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
PDF
Cryptography and Information :Security Fundamentals
Dr. Madhuri Jawale
 
PDF
LEAP-1B presedntation xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
hatem173148
 
PPTX
database slide on modern techniques for optimizing database queries.pptx
aky52024
 
PPTX
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
PDF
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
PDF
top-5-use-cases-for-splunk-security-analytics.pdf
yaghutialireza
 
PDF
Zero Carbon Building Performance standard
BassemOsman1
 
PDF
CAD-CAM U-1 Combined Notes_57761226_2025_04_22_14_40.pdf
shailendrapratap2002
 
PDF
Zero carbon Building Design Guidelines V4
BassemOsman1
 
PPT
Understanding the Key Components and Parts of a Drone System.ppt
Siva Reddy
 
1. SYSTEMS, ROLES, AND DEVELOPMENT METHODOLOGIES.ppt
zilow058
 
Construction of a Thermal Vacuum Chamber for Environment Test of Triple CubeS...
2208441
 
Machine Learning All topics Covers In This Single Slides
AmritTiwari19
 
Biodegradable Plastics: Innovations and Market Potential (www.kiu.ac.ug)
publication11
 
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 
Chad Ayach - A Versatile Aerospace Professional
Chad Ayach
 
Tunnel Ventilation System in Kanpur Metro
220105053
 
Online Cab Booking and Management System.pptx
diptipaneri80
 
quantum computing transition from classical mechanics.pptx
gvlbcy
 
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
Cryptography and Information :Security Fundamentals
Dr. Madhuri Jawale
 
LEAP-1B presedntation xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
hatem173148
 
database slide on modern techniques for optimizing database queries.pptx
aky52024
 
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
top-5-use-cases-for-splunk-security-analytics.pdf
yaghutialireza
 
Zero Carbon Building Performance standard
BassemOsman1
 
CAD-CAM U-1 Combined Notes_57761226_2025_04_22_14_40.pdf
shailendrapratap2002
 
Zero carbon Building Design Guidelines V4
BassemOsman1
 
Understanding the Key Components and Parts of a Drone System.ppt
Siva Reddy
 

Sparkflows Use Cases

Editor's Notes

  • #2: Makes building Big Data Applications Agile, much, much faster and predictable
  • #6: Benefits: Business Users Can Really Interact with Data & Experiment with Building Applications Rich Dashboards - Make day-to-day operations more efficient and provide insights into data and workflow performance Pre-Built Applications which can be easily extended or changed Use Cases Easy to Visualize and Implement