SlideShare a Scribd company logo
Use Cases to Build & Deploy in < 30 min
Self-Serve Big Data Analytics & Applications
2
Agenda
Introduction
Sparkflows Solution
Use Cases
3
100 + Building Blocks
ETL, ML, OCR, NLP, Connect to various
Sources/Sinks
Workflow Editor
Powerful Schema Inference, Schema Propagation,
Interactive Execution
Visualization & DashboardsPrebuilt Workflows
Introduction
4
Workflow Editor
Sparkflows Solution
Rich Visualizations &
Dashboards
100’s of Pre-
built Nodes
Batch & Streaming
Engine
Interactive Execution
Easy Deployment &
Configuration
Pre-built Workflows
Telco Churn Pred
Housing Price Pred
Bike Sharing Analysis
NY Taxi Data Analysis
Movie Lens
Recommendations
5
Sparkflows Product Stack
Streaming
Data
Kafka
Flume
Data
Sources
HIVE/HBase
HDFS/S3
Solr
RDBMS
Apache Spark Cluster
Databricks AWS
IBM
Bluemix
On
Prem
Azur
e
Data Sinks
HIVE/HBase
HDFS/S3
Solr
RDBMS
Visualizations
/ Dashboards
6
Machine Learning
Classification
Regression
Clustering
Collaborative Filtering
Save/Load Model
Predict
Cross-Validator
NLP
NER
Sentiment
OCR
Tesseract
Visualization
Line Chart
Bar Chart
Pie Chart
Updating Dashboards
File Formats
CSV/TSV
Parquet
JSON
Avro
PDF
Images
Whole Files
Feature
Generation
Tokenization
TF, IDF
OneHotEncoder
StringIndexer
Imputer
Scaler
Data Sources/Sinks
HDFS
S3
Kafka, Flume, Twitter
HBase
Solr
Elastic Search
ETL
Joins, Unions
Filter
SQL, Scala, Python
GeoIP
ConcatColumns
Column Filter
Dedup
Languages
SQL
Scala
Jython
Java
Some of the Building Block / Nodes
7
Use Cases in < 30 minutes
Self-Serve Big Data Analytics
ETL Pipelines
NLP
OCR
Streaming Analytics
Do Big Data Analytics with Drag & Drop with 100+ building blocks
Build ETL pipelines with ease. Also incorporate SQL, Scala, Jython in it.
Perform NLP on Big Data with OpenNLP and Stanford CoreNLP
Perform OCR on millions of images with Tesseract
Perform Streaming Analytics reading from Kafka, performing complex
transforms, generate graphs and write out to Solr, Hbase etc.
8
Use Cases in < 30 minutes
Machine Learning
Entity Resolution
Log Analytics
Format Conversion
Load data into Solr, ES,
HBase
Perform Machine Learning on huge datasets with drag and drop
Perform large scale Entity Resolution on data from multiple channels
Build Log Analytics Platform with Kafka, Spark, Solr/Elastic Search, Hue
Convert Big Data from one format to another
Easily load data into Solr, Elastic Search, HBase etc.
9
Use Cases in < 30 minutes
Custom Nodes Create Custom Nodes and drop them in the Library/Workflow Editor
Dashboards Combine various outputs of workflows into a Dashboard
Self-Serve Data Analytics
Spark
CSV
Read
AVRO
Save
JSON
Parquet
Solr
HBase
Elastic
Search
HIVE
Row Filter /
Rename Col
Random
Forest
SQL / Scala / Jython
JOIN
Read
Graph
Graph
Model
Dashboard
ETL – Build ETL pipelines with ease
HIVE
Solr
Spark
CSV Filter
Filter
JOIN SQL
ES
HBase
HIVE
LoadSolr
LoadES
LoadHBase
LoadHIVE
ReadCSV
ReadHIVE
ETL – Connect various SQL for powerful pipelines
HIVE
Solr
Spark
CSV SQL
SQL
SQL SQL
ES
HBase
HIVE
LoadSolr
LoadES
LoadHBase
LoadHIVE
ReadCSV
ReadHIVE
NLP – Perform distributed NLP on Big Data
CSV
Solr
Spark
PDF NLP
NLP
JOIN
ES
HBase
HIVE
LoadSolr
LoadES
LoadHBase
LoadHIVE
ReadPDF
ReadCSV
OCR – Perform distributed OCR on Big Data
Solr
Spark
PDF OCR
ES
HBase
HIVE
LoadSolr
LoadES
LoadHBase
LoadHIVE
ReadPDF
Plus extract
images
Streaming Analytics – With Kafka & Spark Streaming
Solr
Spark
ES
HBase
HIVE
LoadSolr
LoadES
LoadHBase
LoadHIVE
ReadKafka
Apply
various
transforms
K
a
f
k
a
Transform
Graph
Machine Learning – With Spark ML
Spark
Logistic
Regression
Score
Evaluate
Apply
various
transforms
TransformHIVE Split
Entity Resolution – Applying various distance algorithms & scoring
Spark
Dedup
Join &
Transform
DataSet 1
DataSet 2
HIVE
Filter low
Scores
Log Analytics
Spark
IP2Geo
ReadKafka
K
a
f
k
a
Graph
Apache
Logs
Parse Apache
Logs
Save
Solr
HBase
Elastic
Search
HIVE
SQL
HUE
Small Files Problem
CSV
Spark
CSV
Coalesce
HIVE
Read
HIVE
Save
Format Conversion
Spark
CSV
Read
AVRO
Save
JSON
Parquet
CSV
AVRO
JSON
Parquet
Loading Data into Solr, Elastic Search, HBase, HIVE
Spark
CSV
Read
AVRO
Save
JSON
Parquet
Solr
HBase
Elastic
Search
HIVE
Custom Nodes – Create & Use Custom Nodes which add custom features
Spark
Custom Node
Join &
Transform
DataSet 1
DataSet 2
HIVECustom Node
Dashboards – Combine output of various Workflows/Nodes into a Dashboard
24
THANK YOU

More Related Content

PPTX
Sparkflows Use Cases
Jayant Shekhar
 
PPTX
Unifying your data management with Hadoop
Jayant Shekhar
 
PPTX
Sparkflows.io
sparkflows
 
PPTX
Azure Data Factory Data Flows Training v005
Mark Kromer
 
PPTX
Azure Data Factory Data Flows Training (Sept 2020 Update)
Mark Kromer
 
PPTX
Data Quality Patterns in the Cloud with Azure Data Factory
Mark Kromer
 
PPTX
Azure Data Factory Data Wrangling with Power Query
Mark Kromer
 
PDF
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Spark Summit
 
Sparkflows Use Cases
Jayant Shekhar
 
Unifying your data management with Hadoop
Jayant Shekhar
 
Sparkflows.io
sparkflows
 
Azure Data Factory Data Flows Training v005
Mark Kromer
 
Azure Data Factory Data Flows Training (Sept 2020 Update)
Mark Kromer
 
Data Quality Patterns in the Cloud with Azure Data Factory
Mark Kromer
 
Azure Data Factory Data Wrangling with Power Query
Mark Kromer
 
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Spark Summit
 

What's hot (12)

PPTX
ADF Mapping Data Flows Level 300
Mark Kromer
 
PPTX
Mapping Data Flows Training April 2021
Mark Kromer
 
PDF
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Lace Lofranco
 
PPTX
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
Andrew Brust
 
PPTX
Presto: Distributed sql query engine
kiran palaka
 
PDF
Spark and Bloomberg by Sudarshan Kadambi and Partha Nageswaran
Spark Summit
 
PPTX
Microsoft Azure Databricks
Sascha Dittmann
 
PPTX
Azure Data Factory Data Flow Performance Tuning 101
Mark Kromer
 
PPTX
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Rukmani Gopalan
 
PPTX
Introduction to NoSQL
Ahmed Helmy
 
PPTX
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Michael Rys
 
PPTX
U-SQL - Azure Data Lake Analytics for Developers
Michael Rys
 
ADF Mapping Data Flows Level 300
Mark Kromer
 
Mapping Data Flows Training April 2021
Mark Kromer
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Lace Lofranco
 
SQL Server Workshop for Developers - Visual Studio Live! NY 2012
Andrew Brust
 
Presto: Distributed sql query engine
kiran palaka
 
Spark and Bloomberg by Sudarshan Kadambi and Partha Nageswaran
Spark Summit
 
Microsoft Azure Databricks
Sascha Dittmann
 
Azure Data Factory Data Flow Performance Tuning 101
Mark Kromer
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Rukmani Gopalan
 
Introduction to NoSQL
Ahmed Helmy
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Michael Rys
 
U-SQL - Azure Data Lake Analytics for Developers
Michael Rys
 
Ad

Similar to SparkFlow (20)

PPTX
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
sparkflows
 
PDF
How Apache Spark fits in the Big Data landscape
Paco Nathan
 
PDF
Dev Ops Training
Spark Summit
 
PDF
What's new with Apache Spark?
Paco Nathan
 
PDF
How Apache Spark fits into the Big Data landscape
Paco Nathan
 
PDF
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
Hejwowski Piotr
 
PDF
Started with-apache-spark
Happiest Minds Technologies
 
PPTX
Big Data Introduction - Solix empower
Durga Gadiraju
 
PDF
How Apache Spark fits into the Big Data landscape
Paco Nathan
 
PDF
Simple, Modular and Extensible Big Data Platform Concept
Satish Mohan
 
PDF
Big data processing with apache spark
sarith divakar
 
PPTX
In Memory Analytics with Apache Spark
Venkata Naga Ravi
 
PDF
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
PDF
A Tool For Big Data Analysis using Apache Spark
datamantra
 
PDF
Apache Spark: The Next Gen toolset for Big Data Processing
prajods
 
PPTX
Apache Spark on HDinsight Training
Synergetics Learning and Cloud Consulting
 
PDF
Solution Brief: Big Data Lab Accelerator
BlueData, Inc.
 
PPTX
Tools and Methods for Big Data Analytics by Dahl Winters
Melinda Thielbar
 
PPTX
Tools and Methods for Big Data Analytics by Dahl Winters
Melinda Thielbar
 
PPTX
Building a Big Data Pipeline
Jesus Rodriguez
 
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
sparkflows
 
How Apache Spark fits in the Big Data landscape
Paco Nathan
 
Dev Ops Training
Spark Summit
 
What's new with Apache Spark?
Paco Nathan
 
How Apache Spark fits into the Big Data landscape
Paco Nathan
 
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
Hejwowski Piotr
 
Started with-apache-spark
Happiest Minds Technologies
 
Big Data Introduction - Solix empower
Durga Gadiraju
 
How Apache Spark fits into the Big Data landscape
Paco Nathan
 
Simple, Modular and Extensible Big Data Platform Concept
Satish Mohan
 
Big data processing with apache spark
sarith divakar
 
In Memory Analytics with Apache Spark
Venkata Naga Ravi
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
A Tool For Big Data Analysis using Apache Spark
datamantra
 
Apache Spark: The Next Gen toolset for Big Data Processing
prajods
 
Apache Spark on HDinsight Training
Synergetics Learning and Cloud Consulting
 
Solution Brief: Big Data Lab Accelerator
BlueData, Inc.
 
Tools and Methods for Big Data Analytics by Dahl Winters
Melinda Thielbar
 
Tools and Methods for Big Data Analytics by Dahl Winters
Melinda Thielbar
 
Building a Big Data Pipeline
Jesus Rodriguez
 
Ad

Recently uploaded (20)

PPTX
quantum computing transition from classical mechanics.pptx
gvlbcy
 
PDF
top-5-use-cases-for-splunk-security-analytics.pdf
yaghutialireza
 
PDF
settlement FOR FOUNDATION ENGINEERS.pdf
Endalkazene
 
PPTX
Tunnel Ventilation System in Kanpur Metro
220105053
 
PPTX
22PCOAM21 Session 1 Data Management.pptx
Guru Nanak Technical Institutions
 
PDF
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
PPTX
Civil Engineering Practices_BY Sh.JP Mishra 23.09.pptx
bineetmishra1990
 
PDF
Chad Ayach - A Versatile Aerospace Professional
Chad Ayach
 
DOCX
SAR - EEEfdfdsdasdsdasdasdasdasdasdasdasda.docx
Kanimozhi676285
 
PDF
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
PPTX
sunil mishra pptmmmmmmmmmmmmmmmmmmmmmmmmm
singhamit111
 
PDF
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
PPTX
MSME 4.0 Template idea hackathon pdf to understand
alaudeenaarish
 
PDF
Zero Carbon Building Performance standard
BassemOsman1
 
PPTX
Inventory management chapter in automation and robotics.
atisht0104
 
PDF
Introduction to Ship Engine Room Systems.pdf
Mahmoud Moghtaderi
 
PDF
Biodegradable Plastics: Innovations and Market Potential (www.kiu.ac.ug)
publication11
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PDF
Zero carbon Building Design Guidelines V4
BassemOsman1
 
PDF
Cryptography and Information :Security Fundamentals
Dr. Madhuri Jawale
 
quantum computing transition from classical mechanics.pptx
gvlbcy
 
top-5-use-cases-for-splunk-security-analytics.pdf
yaghutialireza
 
settlement FOR FOUNDATION ENGINEERS.pdf
Endalkazene
 
Tunnel Ventilation System in Kanpur Metro
220105053
 
22PCOAM21 Session 1 Data Management.pptx
Guru Nanak Technical Institutions
 
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
Civil Engineering Practices_BY Sh.JP Mishra 23.09.pptx
bineetmishra1990
 
Chad Ayach - A Versatile Aerospace Professional
Chad Ayach
 
SAR - EEEfdfdsdasdsdasdasdasdasdasdasdasda.docx
Kanimozhi676285
 
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
sunil mishra pptmmmmmmmmmmmmmmmmmmmmmmmmm
singhamit111
 
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
MSME 4.0 Template idea hackathon pdf to understand
alaudeenaarish
 
Zero Carbon Building Performance standard
BassemOsman1
 
Inventory management chapter in automation and robotics.
atisht0104
 
Introduction to Ship Engine Room Systems.pdf
Mahmoud Moghtaderi
 
Biodegradable Plastics: Innovations and Market Potential (www.kiu.ac.ug)
publication11
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
Zero carbon Building Design Guidelines V4
BassemOsman1
 
Cryptography and Information :Security Fundamentals
Dr. Madhuri Jawale
 

SparkFlow