Continuous SQL with SQL Stream Builder
Kenny Gorman - Product Owner
Timothy Spann - Principal DataFlow Field Engineer
John Kuchmek - Senior Solutions Engineer
06-May-2021
https://www.meetup.com/futureofdata-newyork/
@PaasDev
© 2021 Cloudera, Inc. All rights reserved. 2
Welcome to Future of Data - Virtual
Princeton Future of Data Meetup
New York Future of Data Meetup
Philadelphia Future of Data Meetup
From Big Data to AI to Streaming to Containers to
Cloud to Analytics to Cloud Storage to Fast Data to
Machine Learning to Microservices to ...
© 2021 Cloudera, Inc. All rights reserved. 3
AGENDA
● Introductions with Kenny, John and Tim
● Flink Quick Overview
● SQL Stream Builder Overview
● Q&A
● Demos
● Q&A - Interactive Panel Session
● Next Meetups
● Raffle
© 2021 Cloudera, Inc. All rights reserved. 4
Cloudera DataFlow Use Cases
Data Movement
Optimize resource utilization by
moving data between data centers
or between on-premises and cloud
infrastructures
e.g. intercontinental data exchange
Logging Modernization
Optimize log analytics solutions by
with CDF in simplifying log ingestion
from the edge, reducing costs and
gaining key analytics
e.g. Splunk / Logstash offload
Streaming analytics insights
Make key business decisions by
analyzing streaming data for
complex patterns, gaining
actionable intelligence etc.
e.g. Fraud detection, Network threat
analysis, app monitoring, Clickstream
analysis
360° view of customer
Ingest, transform and combine
customer data from multiple
sources into a single data view /
lake
e.g. Real-time customer offers,
Loan approvals
IoT & Edge use cases
e.g. Predictive Maintenance, Asset
Tracking / Monitoring, Patient
Monitoring, Quality Processes,
Fleet Management, Connected
Cars and more
Enterprise data management
Managing massive volumes of
high-velocity data to/from legacy
systems, ETL tools and other data
stores
e.g. Flume offload, ETL
replacement, payment data
processing, integration with Oracle
© 2021 Cloudera, Inc. All rights reserved. 5
Simplifying the User Experience
© 2021 Cloudera, Inc. All rights reserved. 6
© 2021 Cloudera, Inc. All rights reserved. 7
APACHE FLINK
Streaming real-time data pipelines
that need to handle complex
stream or batch data event
processing, analytics, and/or
support event-driven applications
USE CASE TECHNOLOGY APPLICATION
Comcast a global media uses
Flink for operationalizing
machine learning models and
near-real-time event stream
processing
Flink helps deliver a
personalized, contextual
interaction reducing time to
support resolutions saving
millions of dollars per year
Flink performs compute at
in-memory speed at any scale
Flink parses SQL using Apache
Calcite, which supports
standard ANSI SQL
Flink runs standalone, on
YARN, and has a K8s Operator
Data Freshness SLAs
Flink can read and write from
Hive data
Review requirements for fault
tolerance, resilience, and HA
Other technologies play in
this space like Hive storage
handler to connect to Kafka
CONSIDERATION
3B+ data points daily streaming in
from 25 million customers running
real time machine learning
prediction
Flink
© 2020 Cloudera, Inc. All rights reserved. 8
FLINK
FEATURES
• Distributed processing engine for
stateful computations
• 10s TBs of managed state
• Flexible and expressive APIs
• Guaranteed correctness &
Exactly-once state consistency
• Event-time semantics
• Flexible deployment & large
ecosystem (K8s, YARN, S3, HDFS..)
• Support for Flink SQL API
© 2021 Cloudera, Inc. All rights reserved. 9
DELIVERING STREAMING ANALYTICS
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. (second)
SQL
Parsing and
Blending Data
Streaming
Analytics
Both offline and
streaming data
Data Analysts Can
Write Queries
Across the Line of Businesses
Capture Events
that Matter
Low-latency analytics use
cases
Events
Processing
© 2021 Cloudera, Inc. All rights reserved. 10
MAINTAINS & CHECKPOINTS STATE
● Flink maintains state locally per task
(in-mem / on-disk)
○ Fast access!
● State is periodically checkpointed to
durable storage
○ A checkpoint is a consistent
snapshot of the state of all tasks
11
Integrated Governance
Unified Governance & Lineage
Flow Management Streams Messaging Stream Processing
Reports Entity and Lineage
information about NiFi Flows
Connects with existing Lineage
information
Topic access centrally managed
supporting granular CRUD
operations
Manage permissions on dedicated
clusters or manage multiple
clusters at once
Manage schemas centrally and
make them available to
consumers/producers
Reports Flink Apps as an operation
Lineage through integration with
existing Lineage information like
Kafka topics, HBase tables etc.
Integrated SQL and materialized
view engine via SQL Stream Builder.
© 2021 Cloudera, Inc. All rights reserved. 12
SQL Stream Builder
● Democratize data access across enterprise - anyone who
knows SQL can create powerful stream processors.
● Iterative interface - Just like SQL on databases, run queries
and reason about the data with an interactive UI.
● Leverages Apache Flink for running of SQL jobs - production
grade, scalable and high performance
● Deep integration and features above and beyond just UI
features - UDF’s, input transforms, Kafka key and time
integration, CEP framework and more.
● Create Materialized Views to integrate with downstream
components like notebooks, visualizations and
applications.
© 2021 Cloudera, Inc. All rights reserved. 13
Streaming SQL
Democratizing access to streams of data via structured query language
© 2021 Cloudera, Inc. All rights reserved. 14
Download these assets today
© 2021 Cloudera, Inc. All rights reserved. 15
TH N Y U

More Related Content

PDF
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
PDF
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
PPTX
Stream processing IoT time series data with Kafka & InfluxDB | Al Sargent, In...
PDF
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
PPTX
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
PDF
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
PDF
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
PDF
Death of the dumb pipes: Using Apache Kafka® for Integration projects
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Stream processing IoT time series data with Kafka & InfluxDB | Al Sargent, In...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
Death of the dumb pipes: Using Apache Kafka® for Integration projects

What's hot (20)

PDF
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
PPTX
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
PDF
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
PDF
Fan-out, fan-in & the multiplexer: Replication recipes for global platform di...
PPTX
EDA Governance Model: a multicloud approach based on GitOps | Alejandro Alija...
PDF
How to Discover, Visualize, Catalog, Share and Reuse your Kafka Streams (Jona...
PDF
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
PDF
Airbyte @ Airflow Summit - The new modern data stack
PDF
Enterprise Metadata Integration
PPTX
Introducing Events and Stream Processing into Nationwide Building Society
PPTX
Transform Your Mainframe Data for the Cloud with Precisely and Apache Kafka
PPTX
Databus - LinkedIn's Change Data Capture Pipeline
PPTX
Modernizing your Application Architecture with Microservices
PDF
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
PDF
Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...
PDF
Building Event-Driven Microservices using Kafka Streams (Stathis Souris, Thou...
PDF
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
PPTX
Kafka Summit NYC 2017 - Achieving Predictability and Compliance with BNY Mell...
PPTX
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
PDF
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Fan-out, fan-in & the multiplexer: Replication recipes for global platform di...
EDA Governance Model: a multicloud approach based on GitOps | Alejandro Alija...
How to Discover, Visualize, Catalog, Share and Reuse your Kafka Streams (Jona...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Airbyte @ Airflow Summit - The new modern data stack
Enterprise Metadata Integration
Introducing Events and Stream Processing into Nationwide Building Society
Transform Your Mainframe Data for the Cloud with Precisely and Apache Kafka
Databus - LinkedIn's Change Data Capture Pipeline
Modernizing your Application Architecture with Microservices
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...
Building Event-Driven Microservices using Kafka Streams (Stathis Souris, Thou...
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
Kafka Summit NYC 2017 - Achieving Predictability and Compliance with BNY Mell...
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Ad

Similar to Continus sql with sql stream builder (20)

PDF
Confluent Partner Tech Talk with Reply
PDF
Confluent Partner Tech Talk with QLIK
PDF
Cloud-Native Patterns for Data-Intensive Applications
PDF
Santander Stream Processing with Apache Flink
PDF
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
PDF
Evolving from Messaging to Event Streaming
PPTX
What’s New in Documentum 7.3
PPTX
Couchbase and Apache Spark
PDF
Confluent & Attunity: Mainframe Data Modern Analytics
PDF
Rivivi il Data in Motion Tour Milano 2024
PDF
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
PPTX
Workshop híbrido: Stream Processing con Flink
PPTX
Data as a Strategic Asset
PPTX
Enhancing AI-Driven User Engagement with Real-Time Data Streaming via Flink.pptx
PDF
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
PDF
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
PDF
Lessons Learned from Modernizing USCIS Data Analytics Platform
PDF
Data Orchestration for the Hybrid Cloud Era
PDF
Confluent Messaging Modernization Forum
PDF
Citi Tech Talk: Hybrid Cloud
Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with QLIK
Cloud-Native Patterns for Data-Intensive Applications
Santander Stream Processing with Apache Flink
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Evolving from Messaging to Event Streaming
What’s New in Documentum 7.3
Couchbase and Apache Spark
Confluent & Attunity: Mainframe Data Modern Analytics
Rivivi il Data in Motion Tour Milano 2024
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Workshop híbrido: Stream Processing con Flink
Data as a Strategic Asset
Enhancing AI-Driven User Engagement with Real-Time Data Streaming via Flink.pptx
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Lessons Learned from Modernizing USCIS Data Analytics Platform
Data Orchestration for the Hybrid Cloud Era
Confluent Messaging Modernization Forum
Citi Tech Talk: Hybrid Cloud
Ad

More from Timothy Spann (20)

PDF
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
PDF
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
PDF
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
PDF
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
PDF
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
PDF
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
PDF
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
PDF
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
PDF
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
PPTX
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
PDF
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
PDF
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
01-Oct-2024_PES-VectorDatabasesAndAI.pdf

Recently uploaded (20)

PPT
introduction of sql, sql commands(DD,DML,DCL))
PDF
How to Write Automated Test Scripts Using Selenium.pdf
PPTX
Hexagone difital twin solution in the desgining
PDF
C language slides for c programming book by ANSI
PDF
Canva Desktop App With Crack Free Download 2025?
PPTX
Independent Consultants’ Biggest Challenges in ERP Projects – and How Apagen ...
PDF
SBOM Document Quality Guide - OpenChain SBOM Study Group
PDF
WhatsApp Chatbots The Key to Scalable Customer Support.pdf
PDF
IDM Crack Activation Key 2025 Free Download
PDF
solman-7.0-ehp1-sp21-incident-management
PDF
Enscape 3D Crack + With 2025 Activation Key free
PDF
Difference Between Website and Web Application.pdf
PPT
ch03 data adnd signals- data communications and networks ppt
PPTX
Beige and Black Minimalist Project Deck Presentation (1).pptx
PPTX
TRAVEL SUPPLIER API INTEGRATION | XML BOOKING ENGINE
PPTX
Relevance Tuning with Genetic Algorithms
PDF
Software Development Company - swapdigit | Best Mobile App Development In India
PDF
OpenColorIO Virtual Town Hall - August 2025
PPTX
SAP Business AI_L1 Overview_EXTERNAL.pptx
PDF
OpenTimelineIO Virtual Town Hall - August 2025
introduction of sql, sql commands(DD,DML,DCL))
How to Write Automated Test Scripts Using Selenium.pdf
Hexagone difital twin solution in the desgining
C language slides for c programming book by ANSI
Canva Desktop App With Crack Free Download 2025?
Independent Consultants’ Biggest Challenges in ERP Projects – and How Apagen ...
SBOM Document Quality Guide - OpenChain SBOM Study Group
WhatsApp Chatbots The Key to Scalable Customer Support.pdf
IDM Crack Activation Key 2025 Free Download
solman-7.0-ehp1-sp21-incident-management
Enscape 3D Crack + With 2025 Activation Key free
Difference Between Website and Web Application.pdf
ch03 data adnd signals- data communications and networks ppt
Beige and Black Minimalist Project Deck Presentation (1).pptx
TRAVEL SUPPLIER API INTEGRATION | XML BOOKING ENGINE
Relevance Tuning with Genetic Algorithms
Software Development Company - swapdigit | Best Mobile App Development In India
OpenColorIO Virtual Town Hall - August 2025
SAP Business AI_L1 Overview_EXTERNAL.pptx
OpenTimelineIO Virtual Town Hall - August 2025

Continus sql with sql stream builder

  • 1. Continuous SQL with SQL Stream Builder Kenny Gorman - Product Owner Timothy Spann - Principal DataFlow Field Engineer John Kuchmek - Senior Solutions Engineer 06-May-2021 https://www.meetup.com/futureofdata-newyork/ @PaasDev
  • 2. © 2021 Cloudera, Inc. All rights reserved. 2 Welcome to Future of Data - Virtual Princeton Future of Data Meetup New York Future of Data Meetup Philadelphia Future of Data Meetup From Big Data to AI to Streaming to Containers to Cloud to Analytics to Cloud Storage to Fast Data to Machine Learning to Microservices to ...
  • 3. © 2021 Cloudera, Inc. All rights reserved. 3 AGENDA ● Introductions with Kenny, John and Tim ● Flink Quick Overview ● SQL Stream Builder Overview ● Q&A ● Demos ● Q&A - Interactive Panel Session ● Next Meetups ● Raffle
  • 4. © 2021 Cloudera, Inc. All rights reserved. 4 Cloudera DataFlow Use Cases Data Movement Optimize resource utilization by moving data between data centers or between on-premises and cloud infrastructures e.g. intercontinental data exchange Logging Modernization Optimize log analytics solutions by with CDF in simplifying log ingestion from the edge, reducing costs and gaining key analytics e.g. Splunk / Logstash offload Streaming analytics insights Make key business decisions by analyzing streaming data for complex patterns, gaining actionable intelligence etc. e.g. Fraud detection, Network threat analysis, app monitoring, Clickstream analysis 360° view of customer Ingest, transform and combine customer data from multiple sources into a single data view / lake e.g. Real-time customer offers, Loan approvals IoT & Edge use cases e.g. Predictive Maintenance, Asset Tracking / Monitoring, Patient Monitoring, Quality Processes, Fleet Management, Connected Cars and more Enterprise data management Managing massive volumes of high-velocity data to/from legacy systems, ETL tools and other data stores e.g. Flume offload, ETL replacement, payment data processing, integration with Oracle
  • 5. © 2021 Cloudera, Inc. All rights reserved. 5 Simplifying the User Experience
  • 6. © 2021 Cloudera, Inc. All rights reserved. 6
  • 7. © 2021 Cloudera, Inc. All rights reserved. 7 APACHE FLINK Streaming real-time data pipelines that need to handle complex stream or batch data event processing, analytics, and/or support event-driven applications USE CASE TECHNOLOGY APPLICATION Comcast a global media uses Flink for operationalizing machine learning models and near-real-time event stream processing Flink helps deliver a personalized, contextual interaction reducing time to support resolutions saving millions of dollars per year Flink performs compute at in-memory speed at any scale Flink parses SQL using Apache Calcite, which supports standard ANSI SQL Flink runs standalone, on YARN, and has a K8s Operator Data Freshness SLAs Flink can read and write from Hive data Review requirements for fault tolerance, resilience, and HA Other technologies play in this space like Hive storage handler to connect to Kafka CONSIDERATION 3B+ data points daily streaming in from 25 million customers running real time machine learning prediction Flink
  • 8. © 2020 Cloudera, Inc. All rights reserved. 8 FLINK FEATURES • Distributed processing engine for stateful computations • 10s TBs of managed state • Flexible and expressive APIs • Guaranteed correctness & Exactly-once state consistency • Event-time semantics • Flexible deployment & large ecosystem (K8s, YARN, S3, HDFS..) • Support for Flink SQL API
  • 9. © 2021 Cloudera, Inc. All rights reserved. 9 DELIVERING STREAMING ANALYTICS 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. (second) SQL Parsing and Blending Data Streaming Analytics Both offline and streaming data Data Analysts Can Write Queries Across the Line of Businesses Capture Events that Matter Low-latency analytics use cases Events Processing
  • 10. © 2021 Cloudera, Inc. All rights reserved. 10 MAINTAINS & CHECKPOINTS STATE ● Flink maintains state locally per task (in-mem / on-disk) ○ Fast access! ● State is periodically checkpointed to durable storage ○ A checkpoint is a consistent snapshot of the state of all tasks
  • 11. 11 Integrated Governance Unified Governance & Lineage Flow Management Streams Messaging Stream Processing Reports Entity and Lineage information about NiFi Flows Connects with existing Lineage information Topic access centrally managed supporting granular CRUD operations Manage permissions on dedicated clusters or manage multiple clusters at once Manage schemas centrally and make them available to consumers/producers Reports Flink Apps as an operation Lineage through integration with existing Lineage information like Kafka topics, HBase tables etc. Integrated SQL and materialized view engine via SQL Stream Builder.
  • 12. © 2021 Cloudera, Inc. All rights reserved. 12 SQL Stream Builder ● Democratize data access across enterprise - anyone who knows SQL can create powerful stream processors. ● Iterative interface - Just like SQL on databases, run queries and reason about the data with an interactive UI. ● Leverages Apache Flink for running of SQL jobs - production grade, scalable and high performance ● Deep integration and features above and beyond just UI features - UDF’s, input transforms, Kafka key and time integration, CEP framework and more. ● Create Materialized Views to integrate with downstream components like notebooks, visualizations and applications.
  • 13. © 2021 Cloudera, Inc. All rights reserved. 13 Streaming SQL Democratizing access to streams of data via structured query language
  • 14. © 2021 Cloudera, Inc. All rights reserved. 14 Download these assets today
  • 15. © 2021 Cloudera, Inc. All rights reserved. 15 TH N Y U