Optimizing industrial operations using the big data ecosystem

Optimizing Industrial Operations
in Real time
using the Bigdata Ecosystem
Kishore Reddipalli
Director - Software Engineering
GE Digital

Agenda
• Usecase
• Spark as Analytic Runtime
• Optimization Framework
• Streaming and Batch Analysis
• Challenges
• QA

GE Mission
• Improve Asset Reliability and Availability
• Monitor Mission Critical Events
• Optimize the Manufacturing process
• Optimize Fleet Operations
• Reduce Unplanned Downtime

Usecase
Power Plant Efficiency:
• Heat rate in the context of power plants can be thought of as
the input needed to produce one unit of output. It generally
indicates the amount of fuel required to generate one unit of
electricity.
• Performance parameters tracked for any thermal power
plant like efficiency, fuel costs, plant load factor, emissions
level, etc. are a function of the station heat rate and can
be linked directly
Source : https://en.wikipedia.org/wiki/Heat_rate_(efficiency)

Data Volume
• In aviation a GE jet engine produces 5000 data points that
can analyzed per second to optimize flight times
• In Power there are 500000 data points need to analyzed for
generating the outcomes. The data points are being
generated from ~1000 sensors
• Data being generated from thousands of GE equipments at a
high volume and rate need to be stored, analyzed at a peta
byte scale.

Predix – Industrial Internet platform that can be
leveraged to build industrial applications
www.predix.io

Spark as a Analytic Runtime
• Rest API (Spark Job Server)
• Security
• Multi-tenancy
• Optimization Framework
• Spark SQL
• Spark Streaming

Optimization Framework
Need for framework – To simplify and bring consistency in
the development of analytics and abstract the complexity of
data connectivity and processing of large volumes of data
• API
• Schema
• Data Providers (Input / Output)
• Data Frames (Variety of Data – Timeseries, Asset,
Configuration)
• Parallelism (Partitioning of data for processing)
• Multi-Mode (Stream vs Batch)
• Multi-Stream Source
• UDF (Aggregation, Interpolation, Unit of Measure)

Optimization Framework -
Architecture

Data Providers
The data connectors to fetch the data from
variety of data sources.
Example:
1. File– (HDFS)
2. HTTP – Restful Services (Asset, Timeseries,
any business services)
3. Database (Cassandra, Postgres)
4. Messaging (Kafka, Kinesis, EventHub)

Timeseries – Dataframe Schema
{
"tags": [
{
"tagId": ”temperature",
"data": [
{
"q": "3",
"ts": "2015-07-
23T12:25:00.000-0000",
"v": "425.07935"

Asset Dataframe - Schema
"tagClassifications": [
{
"id": "OO-
BL000472_Tag_Temperature_Cl
assification_ID",
"name": "OO-
BL000472_Tag_Temperature_Cl
assification_name",
"description": "This is tag
Temperature Classification
description",
"unitGroup": "temperature",
"properties": [
{
"id": "low",
"value": [
80
],
"type": "double"
},
{
"id": "high",
"value": [
120
],
"type": "double"
},
{
"id": "threshold",
"value": [
100

Stream Processing – Data Flow

Stream Processing
• Micro Batch Interval
• Continuous Application
• Multi Stream Sources
• Tenant Aware data Pipeline
• Context based data pipeline
• Window based Slicing– Moving Average

Stream Processing - Pointers
• Micro Batch Interval - “Depends on
Usecase”
• Data Congestion – Instream vs Processing
• Delayed Data – Quality In absence of data

Batch Processing
• Time range of data
• Aggregations
• Parallel Collections
• Partitioning of Data

Challenges
Stream Processing:
- Data Arrival – Delays (Spark 2.x)
- State Persistence (Spark 2.x)
DataProviders:
-GRPC Connector (Shading)
Performance Tuning:
-Parallel Collections of Data (Read/Write)
Yarn-Client Mode Limitations: (Cluster Mode)
-Latency (Distribution of Jars)
-Loading from HDFS

Future Next Steps
• Spark 2.x – Structured Streaming
• Machine Learning Pipelines
• Zeppelin as Service – Interactive
Analysis
• Data Providers – Registration as a
Service

Optimizing industrial operations using the big data ecosystem

More Related Content

What's hot (20)

Similar to Optimizing industrial operations using the big data ecosystem (20)

More from DataWorks Summit (20)

Recently uploaded (20)

Optimizing industrial operations using the big data ecosystem

Editor's Notes