SlideShare a Scribd company logo
Streaming Analytics for IoT
with Apache Spark
WEBINAR | FEBRUARY 16, 2018
Sameer BhideAnand Venugopal
AVP & Business Head, STREAMANALYTIX Senior Solution Architect, STREAMANALYTIX
Agenda
| IoT Market Perspective
| The IoT Application Architecture
| Use Cases: Connected Car | Industrial IoT
| Demo – Building IoT Spark Apps Using
| Key Takeaways
| Q & A
Mission critical
technology solutions
since 1996
Fortune 500:
Big data clients
1700 people; US,
India, global reach
Unique mix of
big data products
and services
ABOUT
IOT Market: Gartner Prediction
7.6 Billion
Things that will ship in 2021
32% CAGR
End point growth rate 2016-2021
25.1 Billion
Units install base in 2021
$3.9 Trillion
Total spending: 2021
end points and services
Smart Home Smart Car Smart Building Smart City Smart Agriculture
Smart Factory Smart Healthcare Smart Data Center Smart Energy Smart Retail
IoT – Market Sub-Domains
IoT Solution Architecture and the Role of Spark
Field IoT
Gateway
Cloud IoT
Integration HUB
Field IoT Gateway
Connected
Things
Enterprise
Applications
Centralized IoT Data Mgmt.
& Analytics Platform
POLL
•Variation in Device Capabilities
• MEP: State-notifications,
Commands, Telemetry
• State Management & Security
•Agility
•Extensibility
•Time-to-market
•Automation
• RT, NRT Analytics
• Offline Analytics & Reporting
• Schematization
• Data Blending & Enrichment
• Multi-party Interaction
•Network & Connectivity
Dynamics
•Real Time
Data Ingestion Telemetry
Computation &
Analytics
Device
Management
Operationalize
IOT Application Architecture: Key Aspects
TELEMETRY
• Network & Connectivity Dynamics
• Real Time Data Ingestion
Data Ingestion
Abstractions
COMPUTATION
& ANALYTICS
• RT, NRT Analytics
• Offline Analytics & Reporting
• Schematization
• Data Blending & Enrichment
• Multi-party Interaction
Event & Micro-Batch
Computation Engines
(Spark, Storm etc.)
DEVICE MANAGEMENT
• Variation in Device Capabilities
• MEP - State-notifications, Commands, Telemetry
• State Management & Security
Metadata Registry &
Management Gateway
OPERATIONALIZATION
• Agility
• Extensibility
• Time-to-market
• Automation
Dataflow (Pipeline)
Management & DSL
Capabilities
1
2
3
4
IOT Application Architecture: Foundational Elements
Managed Devices
EDA / SEDA Sources
IOT Application Architecture: Conceptual Layers
Management LayerIoT Gateway Data Ingestion
Data Processing
& Storage Layer
Insights Layer Action Layer
Security Data
Sources
Compute
Engine
(Spark)
ML - Model
Updates Patterns–
(A-B, Champion
Challenger etc…)
Notification
Services
Fault Tolerance Protocol
Support
Data Filtering,
Blending &
Enrichment
Rule Engine Alerts
State
Management
Ontology &
Metadata
Management
Structured
Query
Feedback
Loop
External
Services
Integration
Device Proxy Data
Persistence
Custom Business
Flows (IFTTT,
Lambda etc)
INGEST ENRICH ANALYZE ACT
Configuration &
Connection Management
Performance
Management
Application Life Cycle
Management
Version Updates
PaaS Integration Computer Infrastructure
SPARK
Infrastructure Layer
Spark as the IoT Compute Engine
| Massively scalable
| Rich set of transformations
| Industry adoption
| Unified & simplified programming model
| Support for machine learning
| Micro-batch capable – tending to NRT
Recommendations
| Adopt an integrated approach to IoT development
| Design a platform layer that can adopt to business’ dynamic needs
| Create a vendor neutral & interoperable architecture
| Adopt software products to quickly operationalize IoT use cases
Ā© 2017 Impetus Technologies
Real-time Stream Processing & Machine Learning Platform
+ Visual Spark Studio
Ingest Analyze
Define Rules
& Actions
Design Once –
Execute Anywhere
Manage
Key Features
Connected Cars Industrial IoT
Use Cases
Connected Car – Driver Risk Profiling
Brief Background
Leading insurance provider in the US
• Classify drivers based on current driving
pattern and historical data
• Raise alerts on behavior change
• Blend data from syndicated and open /
public data marts & services
• Derive additional analytics through
supplemental data flows
Business Need
To create an end-to-end analytics application
for driver profiling & RT risk assessment
Central
Aggregation
Server / Data
Flow
Manager
On-premise: Bare-Metal and/or VMs | Public / Hybrid Cloud
Data Center / Cloud
Storage and Offline Analytics
Device Provisioning and
Management (identity /
registration etc.)
Open Interfaces –
extensibility and
customizability in all
directions
Real Time Dash-boarding
Condition Monitoring
Predictive Maintenance
Smart Alerting
Root Cause Analytics
Closed-loop Feedback
Edge
Custom solution OR
3rd party IoT interface vendor
Data flow
Control flow
End Device 1
End Device 2
End Device 3
End Device 1
End Device 2
End Device 3
Smart Car 1
Smart Car 2
IoT / Connected Car Solution with StreamAnalytix
IOT data
interface
(MQTT / HTTP /
WebSockets)
IOT data
interface
(MQTT / HTTP /
WebSockets)
Gateway
AWS IoT
(Spark)
Ingestion Enrichment Analytics
Public Cloud Services /
Third-party Services
Dashboards
Handheld
Devices
Persistence
Automated Device
Installed in OBD-II Port
High Level Solution Overview
Alerts
Connected Car – Driver Risk Profiling
Connected Car – Driver Risk Profiling
Ingest events using
AWS IoT gateway
Mask PII & enrich data with
external & historical sources
Score a ā€˜Risk Assessment’ model that uses
• Weather conditions
• Time of trip
• Hard brakes and acceleration
• Duration over 70mph
• Previous number of risk instances
Raise alerts based on
risk scores
Create RT and historical
dashboards
Industrial IoT Use Case – Device Health Monitoring
Brief Background
Leader in industrial automation, information,
and engineering services
• Various machine health parameters
collected in different timelines from an
array of sensors
• Compute and store correlation between
sensor data when a process parameter is
altered
• Leverage existing investments in Azure
cloud infrastructure
Business Need
Measure impact to process dynamics by
calculating correlation between various
sensor data
End-to-end solution deployed
on StreamAnalytix
Data pipelines used pre-built
components:
• Data ingestion
• Statistical functions
• Data enrichment
• Visualization
Cloud: Microsoft Azure
Source: Event Hub
Compute: Spark jobs on HDInsight
Orchestration: StreamAnalytix
High Level Solution Approach
Reporting
Dashboard
Manufacturing
Units
HD-Insights
• Ingest data from different MS Azure Event Hub sources
• Enrich incoming data
• Outer-join on incoming datasets
• Aggregate result data and group by plantIDs
• Post streaming results on WebSockets
Data Pipeline View
Industrial Automation - Turbine Data Analytics
Key Takeaways
| IoT capabilities in StreamAnalytix
• Data Sources : Azure Event Hub, AWS IoT, MQTT, Kinesis, S3
• Data Sink : Redshift, Hadoop, MQTT, S3, Kinesis, WebSockets
• PaaS Service Integration : SQS, Lambda, SNS
| Integrated approach to IoT development
| IoT applications are dynamic
| Vendor neutral & interoperable architecture
| COTS & open source offerings to quickly operationalize IoT use cases
SanJose,CA|March5-8Grapevine,TX|March5-8
Booth#809 Booth#817
Data&AnalyticsSummit2018
Thank you.
Questions?
Ā© 2018 Impetus Technologies
Email: inquiry@streamanalytix.com | @ImpetusTech / @StreamAnalytix
Download Visual Spark Studioā„¢ - Zero cost, light weight development tool.
Full range of data processing and analytics functionality to build Spark applications.

More Related Content

What's hot (20)

PDF
Meetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunen
Digipolis Antwerpen
Ā 
PPTX
The Life of an Internet of Things Electron
DataWorks Summit/Hadoop Summit
Ā 
PPTX
Azure stream analytics by Nico Jacobs
ITProceed
Ā 
PDF
Deep Learning Image Processing Applications in the Enterprise
Ganesan Narayanasamy
Ā 
PDF
Real-time analytics in IoT by Sam Vanhoutte (@Building The Future 2019)
Codit
Ā 
PDF
Successful AI/ML Projects with End-to-End Cloud Data Engineering
Databricks
Ā 
PPTX
Zero Downtime App Deployment using Hadoop
DataWorks Summit/Hadoop Summit
Ā 
PPTX
Azure iot suite
Mostafa Ramezani
Ā 
PDF
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
Databricks
Ā 
PDF
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark Summit
Ā 
PDF
Extending Operations from On-premises Solutions Towards Hybrid and Cloud - Da...
Codit
Ā 
PDF
End to End Supply Chain Control Tower
Databricks
Ā 
PPTX
SnapLogic Live: Salesforce Integration
SnapLogic
Ā 
PPTX
Driving the On-Demand Economy with Spark and Predictive Analytics
SingleStore
Ā 
PPTX
Activeeon technology for Big Compute and cloud migration
Activeeon
Ā 
PPTX
Getting started with IoT
Codit
Ā 
PPTX
Snaplogic Live: Big Data in Motion
SnapLogic
Ā 
PPTX
SnapLogic Live: AWS Integration
SnapLogic
Ā 
PPTX
Next Generation of Data Integration with Azure Data Factory by Tom Kerkhove
Codit
Ā 
Meetup 27/6/2018: AIOPS om de uitdagingen van een slimme stad te ondersteunen
Digipolis Antwerpen
Ā 
The Life of an Internet of Things Electron
DataWorks Summit/Hadoop Summit
Ā 
Azure stream analytics by Nico Jacobs
ITProceed
Ā 
Deep Learning Image Processing Applications in the Enterprise
Ganesan Narayanasamy
Ā 
Real-time analytics in IoT by Sam Vanhoutte (@Building The Future 2019)
Codit
Ā 
Successful AI/ML Projects with End-to-End Cloud Data Engineering
Databricks
Ā 
Zero Downtime App Deployment using Hadoop
DataWorks Summit/Hadoop Summit
Ā 
Azure iot suite
Mostafa Ramezani
Ā 
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
Databricks
Ā 
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark Summit
Ā 
Extending Operations from On-premises Solutions Towards Hybrid and Cloud - Da...
Codit
Ā 
End to End Supply Chain Control Tower
Databricks
Ā 
SnapLogic Live: Salesforce Integration
SnapLogic
Ā 
Driving the On-Demand Economy with Spark and Predictive Analytics
SingleStore
Ā 
Activeeon technology for Big Compute and cloud migration
Activeeon
Ā 
Getting started with IoT
Codit
Ā 
Snaplogic Live: Big Data in Motion
SnapLogic
Ā 
SnapLogic Live: AWS Integration
SnapLogic
Ā 
Next Generation of Data Integration with Azure Data Factory by Tom Kerkhove
Codit
Ā 

Similar to Streaming Analytics for IoT with Apache Spark (20)

PDF
IoT Analytics
Anjana Fernando
Ā 
PDF
Build your First IoT Application with IBM Watson IoT
Janakiram MSV
Ā 
PDF
Streaming Analytics for IoT-Oriented Applications
DATAVERSITY
Ā 
PDF
Brian Gilmore [InfluxData] | Use Case: IIoT Overview | InfluxDays 2022
InfluxData
Ā 
PDF
WSO2Con ASIA 2016: IoT Analytics
WSO2
Ā 
PPTX
Analytics in IoT
wesley Dias
Ā 
PPTX
Top 5 IoT Use Cases
Cloudera, Inc.
Ā 
PDF
Barga ACM DEBS 2013 Keynote
Roger Barga
Ā 
PPTX
How Spark Enables the Internet of Things- Paula Ta-Shma
Spark Summit
Ā 
PDF
Real time Analytics in IoT - Marcel Lattmann Codit Switzerland @.NET Day 2019
Codit
Ā 
PDF
Getting insights from IoT data with Apache Spark and Apache Bahir
Luciano Resende
Ā 
PDF
IoT & Data Analytics Sharing Session - Telkomsigma
Togi Nababan
Ā 
PDF
Data Analytics for IoT - BrightTalk Webinar
Muralidhar Somisetty
Ā 
PPTX
Powering the Internet of Things with Apache Hadoop
Cloudera, Inc.
Ā 
PDF
IoT devices enabled for data analytics intelligent decision making using mach...
IRJET Journal
Ā 
PPTX
Make Streaming IoT Analytics Work for You
Hortonworks
Ā 
PPTX
Lunch Keynote
UIResearchPark
Ā 
PPT
Real-time data integration to the cloud
Sankar Nagarajan
Ā 
PPTX
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
sparktc
Ā 
PPTX
Iot data analytics
Unmesh Ballal
Ā 
IoT Analytics
Anjana Fernando
Ā 
Build your First IoT Application with IBM Watson IoT
Janakiram MSV
Ā 
Streaming Analytics for IoT-Oriented Applications
DATAVERSITY
Ā 
Brian Gilmore [InfluxData] | Use Case: IIoT Overview | InfluxDays 2022
InfluxData
Ā 
WSO2Con ASIA 2016: IoT Analytics
WSO2
Ā 
Analytics in IoT
wesley Dias
Ā 
Top 5 IoT Use Cases
Cloudera, Inc.
Ā 
Barga ACM DEBS 2013 Keynote
Roger Barga
Ā 
How Spark Enables the Internet of Things- Paula Ta-Shma
Spark Summit
Ā 
Real time Analytics in IoT - Marcel Lattmann Codit Switzerland @.NET Day 2019
Codit
Ā 
Getting insights from IoT data with Apache Spark and Apache Bahir
Luciano Resende
Ā 
IoT & Data Analytics Sharing Session - Telkomsigma
Togi Nababan
Ā 
Data Analytics for IoT - BrightTalk Webinar
Muralidhar Somisetty
Ā 
Powering the Internet of Things with Apache Hadoop
Cloudera, Inc.
Ā 
IoT devices enabled for data analytics intelligent decision making using mach...
IRJET Journal
Ā 
Make Streaming IoT Analytics Work for You
Hortonworks
Ā 
Lunch Keynote
UIResearchPark
Ā 
Real-time data integration to the cloud
Sankar Nagarajan
Ā 
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
sparktc
Ā 
Iot data analytics
Unmesh Ballal
Ā 
Ad

More from Impetus Technologies (19)

PPTX
The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
Impetus Technologies
Ā 
PPTX
Eliminate cyber-security threats using data analytics – Build a resilient ent...
Impetus Technologies
Ā 
PPTX
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
Impetus Technologies
Ā 
PPTX
Building a mature foundation for life in the cloud
Impetus Technologies
Ā 
PPTX
Best practices to build a sustainable data lake on cloud - Impetus Webinar
Impetus Technologies
Ā 
PPTX
Automate and Optimize Data Warehouse Migration to Snowflake
Impetus Technologies
Ā 
PPTX
Instantly convert Teradata ETL and EDW to Spark- Impetus webinar
Impetus Technologies
Ā 
PPTX
Keys to establish sustainable DW and analytics on the cloud -Impetus webinar
Impetus Technologies
Ā 
PPTX
Solving the EDW transformation conundrum - Impetus webinar
Impetus Technologies
Ā 
PPTX
Anomaly detection with machine learning at scale
Impetus Technologies
Ā 
PPTX
Keys to Formulating an Effective Data Management Strategy in the Age of Data
Impetus Technologies
Ā 
PPTX
Build Spark-based ETL Workflows on Cloud in Minutes
Impetus Technologies
Ā 
PPTX
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Impetus Technologies
Ā 
PPTX
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Impetus Technologies
Ā 
PPTX
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Impetus Technologies
Ā 
PPTX
The structured streaming upgrade to Apache Spark and how enterprises can bene...
Impetus Technologies
Ā 
PPTX
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Impetus Technologies
Ā 
PPTX
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Impetus Technologies
Ā 
PPTX
Importance of Big Data Analytics
Impetus Technologies
Ā 
The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
Impetus Technologies
Ā 
Eliminate cyber-security threats using data analytics – Build a resilient ent...
Impetus Technologies
Ā 
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
Impetus Technologies
Ā 
Building a mature foundation for life in the cloud
Impetus Technologies
Ā 
Best practices to build a sustainable data lake on cloud - Impetus Webinar
Impetus Technologies
Ā 
Automate and Optimize Data Warehouse Migration to Snowflake
Impetus Technologies
Ā 
Instantly convert Teradata ETL and EDW to Spark- Impetus webinar
Impetus Technologies
Ā 
Keys to establish sustainable DW and analytics on the cloud -Impetus webinar
Impetus Technologies
Ā 
Solving the EDW transformation conundrum - Impetus webinar
Impetus Technologies
Ā 
Anomaly detection with machine learning at scale
Impetus Technologies
Ā 
Keys to Formulating an Effective Data Management Strategy in the Age of Data
Impetus Technologies
Ā 
Build Spark-based ETL Workflows on Cloud in Minutes
Impetus Technologies
Ā 
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Impetus Technologies
Ā 
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Impetus Technologies
Ā 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Impetus Technologies
Ā 
The structured streaming upgrade to Apache Spark and how enterprises can bene...
Impetus Technologies
Ā 
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Impetus Technologies
Ā 
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Impetus Technologies
Ā 
Importance of Big Data Analytics
Impetus Technologies
Ā 
Ad

Recently uploaded (20)

PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
Ā 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
Ā 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
Ā 
PDF
Business implication of Artificial Intelligence.pdf
VishalChugh12
Ā 
PPTX
What Is Data Integration and Transformation?
subhashenia
Ā 
PPTX
How to Add Columns and Rows in an R Data Frame
subhashenia
Ā 
PDF
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
Ā 
PDF
Research Methodology Overview Introduction
ayeshagul29594
Ā 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
Ā 
PPTX
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
Ā 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
Ā 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
Ā 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
Ā 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
Ā 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
Ā 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
Ā 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
šŸ“Š Markus Baersch
Ā 
PPTX
Powerful Uses of Data Analytics You Should Know
subhashenia
Ā 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
Ā 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
Ā 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
Ā 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
Ā 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
Ā 
Business implication of Artificial Intelligence.pdf
VishalChugh12
Ā 
What Is Data Integration and Transformation?
subhashenia
Ā 
How to Add Columns and Rows in an R Data Frame
subhashenia
Ā 
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
Ā 
Research Methodology Overview Introduction
ayeshagul29594
Ā 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
Ā 
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
Ā 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
Ā 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
Ā 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
Ā 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
Ā 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
Ā 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
Ā 
JavaScript - Good or Bad? Tips for Google Tag Manager
šŸ“Š Markus Baersch
Ā 
Powerful Uses of Data Analytics You Should Know
subhashenia
Ā 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
Ā 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
Ā 

Streaming Analytics for IoT with Apache Spark

  • 1. Streaming Analytics for IoT with Apache Spark WEBINAR | FEBRUARY 16, 2018 Sameer BhideAnand Venugopal AVP & Business Head, STREAMANALYTIX Senior Solution Architect, STREAMANALYTIX
  • 2. Agenda | IoT Market Perspective | The IoT Application Architecture | Use Cases: Connected Car | Industrial IoT | Demo – Building IoT Spark Apps Using | Key Takeaways | Q & A
  • 3. Mission critical technology solutions since 1996 Fortune 500: Big data clients 1700 people; US, India, global reach Unique mix of big data products and services ABOUT
  • 4. IOT Market: Gartner Prediction 7.6 Billion Things that will ship in 2021 32% CAGR End point growth rate 2016-2021 25.1 Billion Units install base in 2021 $3.9 Trillion Total spending: 2021 end points and services
  • 5. Smart Home Smart Car Smart Building Smart City Smart Agriculture Smart Factory Smart Healthcare Smart Data Center Smart Energy Smart Retail IoT – Market Sub-Domains
  • 6. IoT Solution Architecture and the Role of Spark Field IoT Gateway Cloud IoT Integration HUB Field IoT Gateway Connected Things Enterprise Applications Centralized IoT Data Mgmt. & Analytics Platform
  • 8. •Variation in Device Capabilities • MEP: State-notifications, Commands, Telemetry • State Management & Security •Agility •Extensibility •Time-to-market •Automation • RT, NRT Analytics • Offline Analytics & Reporting • Schematization • Data Blending & Enrichment • Multi-party Interaction •Network & Connectivity Dynamics •Real Time Data Ingestion Telemetry Computation & Analytics Device Management Operationalize IOT Application Architecture: Key Aspects
  • 9. TELEMETRY • Network & Connectivity Dynamics • Real Time Data Ingestion Data Ingestion Abstractions COMPUTATION & ANALYTICS • RT, NRT Analytics • Offline Analytics & Reporting • Schematization • Data Blending & Enrichment • Multi-party Interaction Event & Micro-Batch Computation Engines (Spark, Storm etc.) DEVICE MANAGEMENT • Variation in Device Capabilities • MEP - State-notifications, Commands, Telemetry • State Management & Security Metadata Registry & Management Gateway OPERATIONALIZATION • Agility • Extensibility • Time-to-market • Automation Dataflow (Pipeline) Management & DSL Capabilities 1 2 3 4 IOT Application Architecture: Foundational Elements
  • 10. Managed Devices EDA / SEDA Sources IOT Application Architecture: Conceptual Layers Management LayerIoT Gateway Data Ingestion Data Processing & Storage Layer Insights Layer Action Layer Security Data Sources Compute Engine (Spark) ML - Model Updates Patterns– (A-B, Champion Challenger etc…) Notification Services Fault Tolerance Protocol Support Data Filtering, Blending & Enrichment Rule Engine Alerts State Management Ontology & Metadata Management Structured Query Feedback Loop External Services Integration Device Proxy Data Persistence Custom Business Flows (IFTTT, Lambda etc) INGEST ENRICH ANALYZE ACT Configuration & Connection Management Performance Management Application Life Cycle Management Version Updates PaaS Integration Computer Infrastructure SPARK Infrastructure Layer
  • 11. Spark as the IoT Compute Engine | Massively scalable | Rich set of transformations | Industry adoption | Unified & simplified programming model | Support for machine learning | Micro-batch capable – tending to NRT
  • 12. Recommendations | Adopt an integrated approach to IoT development | Design a platform layer that can adopt to business’ dynamic needs | Create a vendor neutral & interoperable architecture | Adopt software products to quickly operationalize IoT use cases
  • 13. Ā© 2017 Impetus Technologies Real-time Stream Processing & Machine Learning Platform + Visual Spark Studio
  • 14. Ingest Analyze Define Rules & Actions Design Once – Execute Anywhere Manage Key Features
  • 15. Connected Cars Industrial IoT Use Cases
  • 16. Connected Car – Driver Risk Profiling Brief Background Leading insurance provider in the US • Classify drivers based on current driving pattern and historical data • Raise alerts on behavior change • Blend data from syndicated and open / public data marts & services • Derive additional analytics through supplemental data flows Business Need To create an end-to-end analytics application for driver profiling & RT risk assessment
  • 17. Central Aggregation Server / Data Flow Manager On-premise: Bare-Metal and/or VMs | Public / Hybrid Cloud Data Center / Cloud Storage and Offline Analytics Device Provisioning and Management (identity / registration etc.) Open Interfaces – extensibility and customizability in all directions Real Time Dash-boarding Condition Monitoring Predictive Maintenance Smart Alerting Root Cause Analytics Closed-loop Feedback Edge Custom solution OR 3rd party IoT interface vendor Data flow Control flow End Device 1 End Device 2 End Device 3 End Device 1 End Device 2 End Device 3 Smart Car 1 Smart Car 2 IoT / Connected Car Solution with StreamAnalytix IOT data interface (MQTT / HTTP / WebSockets) IOT data interface (MQTT / HTTP / WebSockets) Gateway
  • 18. AWS IoT (Spark) Ingestion Enrichment Analytics Public Cloud Services / Third-party Services Dashboards Handheld Devices Persistence Automated Device Installed in OBD-II Port High Level Solution Overview Alerts
  • 19. Connected Car – Driver Risk Profiling
  • 20. Connected Car – Driver Risk Profiling Ingest events using AWS IoT gateway Mask PII & enrich data with external & historical sources Score a ā€˜Risk Assessment’ model that uses • Weather conditions • Time of trip • Hard brakes and acceleration • Duration over 70mph • Previous number of risk instances Raise alerts based on risk scores Create RT and historical dashboards
  • 21. Industrial IoT Use Case – Device Health Monitoring Brief Background Leader in industrial automation, information, and engineering services • Various machine health parameters collected in different timelines from an array of sensors • Compute and store correlation between sensor data when a process parameter is altered • Leverage existing investments in Azure cloud infrastructure Business Need Measure impact to process dynamics by calculating correlation between various sensor data
  • 22. End-to-end solution deployed on StreamAnalytix Data pipelines used pre-built components: • Data ingestion • Statistical functions • Data enrichment • Visualization Cloud: Microsoft Azure Source: Event Hub Compute: Spark jobs on HDInsight Orchestration: StreamAnalytix High Level Solution Approach Reporting Dashboard Manufacturing Units HD-Insights
  • 23. • Ingest data from different MS Azure Event Hub sources • Enrich incoming data • Outer-join on incoming datasets • Aggregate result data and group by plantIDs • Post streaming results on WebSockets Data Pipeline View Industrial Automation - Turbine Data Analytics
  • 24. Key Takeaways | IoT capabilities in StreamAnalytix • Data Sources : Azure Event Hub, AWS IoT, MQTT, Kinesis, S3 • Data Sink : Redshift, Hadoop, MQTT, S3, Kinesis, WebSockets • PaaS Service Integration : SQS, Lambda, SNS | Integrated approach to IoT development | IoT applications are dynamic | Vendor neutral & interoperable architecture | COTS & open source offerings to quickly operationalize IoT use cases
  • 26. Thank you. Questions? Ā© 2018 Impetus Technologies Email: [email protected] | @ImpetusTech / @StreamAnalytix Download Visual Spark Studioā„¢ - Zero cost, light weight development tool. Full range of data processing and analytics functionality to build Spark applications.