SlideShare a Scribd company logo
Pouring the Foundation: The Journey to Big
Data Management at CenterPoint Energy
CenterPoint Energy Proprietary Information
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Utility Industry Challenges & Pressures
Aging Assets &
Workforce Data Growth
Regulatory Pressure Alternative / Distributed
Energy
Driving
Innovation
In the power and utilities
space, the Big Data challenge
is centered on harnessing
massive new influxes of
information to meet business
imperatives such as reliability
& efficiency, safety &
security, profitability, and an
evolving intelligent grid
serving an increasingly
sophisticated customer base.
Source: PennWell “Big Data:
Business Insight for Power and
Utilities”, February 2016
In addition to these challenges, customers are becoming more demanding!!
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Smart Grid Explosion
Big Data in the utilities sector
can only get even bigger as the
smart transformation of the
industry accelerates. It is
estimated that 680 million smart
meters will be installed globally
by 2017 – leading to 280
petabytes of data a year.
Capgemini Consulting: “Big Data
BlackOut: Are Utilities Powering Up
Their Data Analytics?”, 2015
https://www.marsdd.com/wp-content/uploads/2014/08/MaRS-ConnectedWorld-AMI-Figure2-GlobalSmartMeterInstallations.jpg
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Utility Analytics Spending on the Rise
Market analyst GTM Research predicts global utility company expenditure on data analytics will grow from $700m in 2012 to
$3.8bn in 2020, with gas, electricity, and water suppliers in all regions of the world increasing their investment.
Source: Engineering and Technology Magazine “How utilities are profiting from Big Data analytics”, January 20, 2014
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Actionable Intelligence Transforms Energy & Utilities
Asset
Data
Customer
Surveys
Weather &
Environmental
Service Fleet
GPS Data
Smart Meter
Streams
Commodity
Prices
REVENUE
PROTECTION
SINGLE VIEW
OF CUSTOMER
PREDICTIVE EQUIPMENT
MAINTENANCE
CONSERVATION
VOLTAGE REDUCTION
NEXT BEST ACTION
PROGRAMS
Social
Media
GIS
Data
SCADA Outage
Histories
CIS
Records
EDW
Agenda
6CenterPoint Energy Proprietary Information
About CenterPoint
Business Challenge
Design
Smart Meter Use Cases
CNP Architecture
Other Hadoop Initiatives
About
7CenterPoint Energy Proprietary Information
 Publicly traded on New York Stock Exchange
 Headquartered in Houston, Texas
 Over 5000 square miles of electric transmission
and distribution service area
 Assets total $22 billion
 Over 7,700 plus employees
 CNP & its predecessor companies in
business for over 140 years
 Over 5.5 Million
Metered
Customers
 2.4 million Smart
Meters
 3,718 Miles of
Transmission
 52,639 Miles of
Distribution
 Electric
Transmission &
Distribution
 Natural Gas
Distribution
 Competitive Natural
Gas Sales and
Services
Business Challenge
1+ PB of Smart Meter Data
 2.4MM Smart Meters taking readings every 15
creating 230MM Readings per day, or over 84 Billion
Readings in a Year.
 Regulatory requirements require historical readings to
be available for 10 years.
 Uncompressed Data Growth of 8TB per month and
over 1PB in a 10 year period.
 Current DW technology is approaching End of Life
 Massive amounts of data stored in proprietary vendor
solution, was hard to manage and has a significantly
high total cost of ownership.
 Need a cost effective solution for today's analytics,
regulatory requirements and preparation for future
use cases.
8CenterPoint Energy Proprietary Information
Vision for ADMP
9CenterPoint Energy Proprietary Information
Cost effective, scalable data management platform
Data resides in the data tier which aligns with the response
time required
Real time reporting
Reliable
Support future advance use cases, streaming, machine
learning, cognitive computing, etc.
Architecture
10CenterPoint Energy Proprietary Information
ApplicationsDataLake
Data
Sources
ETLand
Streaming
Traditional
(OLTP, OLAP, RDBMS)
Unstructured
Data Flow
Interval data is loaded to SAP HANA 3 times a
day using SAP Data Services
• Intervals can be updated at any point but the majority of the
updates happen within 13 months
After 13 months, interval data is aged from SAP
HANA to Hive using Sqoop
• Interval data can still be updated occasionally after 13 months
i.e. meter firmware update
Master data is loaded into Hadoop using Sqoop
CenterPoint Energy Proprietary Information
Hive Design
Transactional Hive table required for updates
Shell script used to move data from staging to transactional
target. Sqoop does not support inserts into a transactional table
Partitioned by day with 8 buckets on premise identification
number
File size aligned with HDFS block size
Master data bucketed the same as interval data to take
advantage of performance gains during joins
Data is sorted during the insert to the transactional table
• If new data is inserted to a partition after the initial load, the partition is reloaded
CenterPoint Energy Proprietary Information
Smart Meter Use Cases
13CenterPoint Energy Proprietary Information
Forecasting Model Engine
 How does weather and consumer behavior impact
load?
 Weather response functions
 Short-term and long-term forecasts
 Weather normalization
Smart Meter Use Cases Continued
14CenterPoint Energy Proprietary Information
Diversion
 Utilize interval and event data to detect and analyze any
tamper or diversion attempt
Smart Meter Use Cases Continued
15CenterPoint Energy Proprietary Information
Usage History Portal
 Web front-end for internal and external customers to
view interval data for a premise
Transformer Load Managment
 Identify at risk transformers
 Maximize usable life
Load Studies
 Hourly loads by rate class used in rate cases to allocate
cost to rate classes
 Previously random samples were used
Other Hadoop Initiatives
16CenterPoint Energy Proprietary Information
Document Storage
 Historical invoices
 5 million gas & electric PDF invoices a month
 10 years of history required
 Sub second response time required by web front-end
 Less than 100 KB
 Historical mainframe reports
 Mainframe is being decommissioned but business
clients still need access to historical reports
 Response time less than 10 seconds is acceptable
 Reports are converted to text files and stored as
blobs in Hive

More Related Content

PDF
Beyond Big Data: Data Science and AI
DataWorks Summit
 
PPTX
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
DataWorks Summit
 
PPTX
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
DataWorks Summit
 
PPTX
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Precisely
 
PDF
Transforming GE Healthcare with Data Platform Strategy
Databricks
 
PDF
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
DataWorks Summit
 
PPTX
Building intelligent applications, experimental ML with Uber’s Data Science W...
DataWorks Summit
 
PPTX
Big Data at Geisinger Health System: Big Wins in a Short Time
DataWorks Summit
 
Beyond Big Data: Data Science and AI
DataWorks Summit
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
DataWorks Summit
 
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
DataWorks Summit
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Precisely
 
Transforming GE Healthcare with Data Platform Strategy
Databricks
 
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
DataWorks Summit
 
Building intelligent applications, experimental ML with Uber’s Data Science W...
DataWorks Summit
 
Big Data at Geisinger Health System: Big Wins in a Short Time
DataWorks Summit
 

What's hot (20)

PDF
Hybrid Cloud Strategy for Big Data and Analytics
DataWorks Summit/Hadoop Summit
 
PPTX
Fighting Financial Crime with Artificial Intelligence
DataWorks Summit
 
PPTX
Log I am your father
DataWorks Summit/Hadoop Summit
 
PPTX
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Infochimps, a CSC Big Data Business
 
PPTX
Data Science at Speed. At Scale.
DataWorks Summit
 
PPTX
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
DataWorks Summit
 
PPTX
Multi-tenant Hadoop - the challenge of maintaining high SLAS
DataWorks Summit
 
PDF
Democratizing Data Science on Kubernetes
John Archer
 
PPTX
BI on Big Data with instant response times at Verizon
DataWorks Summit
 
PDF
High Performance Spatial-Temporal Trajectory Analysis with Spark
DataWorks Summit/Hadoop Summit
 
PPTX
Extending Data Lake using the Lambda Architecture June 2015
DataWorks Summit
 
PPTX
Depositing Value from Transactional Data at Danske Bank
DataWorks Summit/Hadoop Summit
 
PPTX
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
PPTX
Hadoop Journey at Walgreens
DataWorks Summit
 
PPTX
How Market Intelligence From Hadoop on Azure Shows Trucking Companies a Clear...
DataWorks Summit
 
PPTX
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
DataWorks Summit/Hadoop Summit
 
PPTX
Lessons learned processing 70 billion data points a day using the hybrid cloud
DataWorks Summit
 
PDF
The Ecosystem is too damn big
DataWorks Summit/Hadoop Summit
 
PPTX
Compute-based sizing and system dashboard
DataWorks Summit
 
PPTX
Practical advice to build a data driven company
DataWorks Summit/Hadoop Summit
 
Hybrid Cloud Strategy for Big Data and Analytics
DataWorks Summit/Hadoop Summit
 
Fighting Financial Crime with Artificial Intelligence
DataWorks Summit
 
Log I am your father
DataWorks Summit/Hadoop Summit
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Infochimps, a CSC Big Data Business
 
Data Science at Speed. At Scale.
DataWorks Summit
 
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
DataWorks Summit
 
Multi-tenant Hadoop - the challenge of maintaining high SLAS
DataWorks Summit
 
Democratizing Data Science on Kubernetes
John Archer
 
BI on Big Data with instant response times at Verizon
DataWorks Summit
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
DataWorks Summit/Hadoop Summit
 
Extending Data Lake using the Lambda Architecture June 2015
DataWorks Summit
 
Depositing Value from Transactional Data at Danske Bank
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Hadoop Journey at Walgreens
DataWorks Summit
 
How Market Intelligence From Hadoop on Azure Shows Trucking Companies a Clear...
DataWorks Summit
 
Disrupting Insurance with Advanced Analytics The Next Generation Carrier
DataWorks Summit/Hadoop Summit
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
DataWorks Summit
 
The Ecosystem is too damn big
DataWorks Summit/Hadoop Summit
 
Compute-based sizing and system dashboard
DataWorks Summit
 
Practical advice to build a data driven company
DataWorks Summit/Hadoop Summit
 
Ad

Similar to Pouring the Foundation: Data Management in the Energy Industry (20)

PDF
Compu Dynamics White Paper - Essential Elements for Data Center Optimization
Dan Ephraim
 
PPTX
New Technologies For The Sustainable Enterprise; keynote @Wharton
Paul Hofmann
 
PDF
Redefining-Smart-Grid-Architectural-Thinking-Using-Stream-Computing
Ajoy Kumar
 
PPTX
SG Data analytics.pptx
Danish Mahmood
 
PDF
Ericsson hds 8000 wp 16
Mainstay
 
PDF
Improvements in Data Center Management
ScottMadden, Inc.
 
DOC
State Of The Market Mission Critical Facilities
Ann Fiorelli
 
DOC
State Of The Market Mission Critical Facilities
Ann Fiorelli
 
PPTX
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
DataWorks Summit
 
PPT
IBM Power 7
None
 
PDF
Paris FOD Meetup #5 Cognizant Presentation
Abdelkrim Hadjidj
 
PPTX
Wikibon #IoT #HyperConvergence Presentation via @theCUBE
John Furrier
 
PPTX
Hyper-Convergence CrowdChat
Wikibon Community
 
PDF
Innovating With Data and Analytics
VMware Tanzu
 
PDF
Big Data Blackout: Are Utilities Powering up their Data Analytics
Rick Bouter
 
PDF
Big Data BlackOut: Are Utilities Powering Up Their Data Analytics?
Capgemini
 
PPTX
The Evolution of Data Architecture
Wei-Chiu Chuang
 
PDF
Pipeline and Gas Tech April 09 - SCADA Evolution
smrobb
 
PDF
Big Data for Product Managers
Pentaho
 
PDF
Dell_whitepaper[1]
Jim Romeo
 
Compu Dynamics White Paper - Essential Elements for Data Center Optimization
Dan Ephraim
 
New Technologies For The Sustainable Enterprise; keynote @Wharton
Paul Hofmann
 
Redefining-Smart-Grid-Architectural-Thinking-Using-Stream-Computing
Ajoy Kumar
 
SG Data analytics.pptx
Danish Mahmood
 
Ericsson hds 8000 wp 16
Mainstay
 
Improvements in Data Center Management
ScottMadden, Inc.
 
State Of The Market Mission Critical Facilities
Ann Fiorelli
 
State Of The Market Mission Critical Facilities
Ann Fiorelli
 
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
DataWorks Summit
 
IBM Power 7
None
 
Paris FOD Meetup #5 Cognizant Presentation
Abdelkrim Hadjidj
 
Wikibon #IoT #HyperConvergence Presentation via @theCUBE
John Furrier
 
Hyper-Convergence CrowdChat
Wikibon Community
 
Innovating With Data and Analytics
VMware Tanzu
 
Big Data Blackout: Are Utilities Powering up their Data Analytics
Rick Bouter
 
Big Data BlackOut: Are Utilities Powering Up Their Data Analytics?
Capgemini
 
The Evolution of Data Architecture
Wei-Chiu Chuang
 
Pipeline and Gas Tech April 09 - SCADA Evolution
smrobb
 
Big Data for Product Managers
Pentaho
 
Dell_whitepaper[1]
Jim Romeo
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

Recently uploaded (20)

PDF
Doc9.....................................
SofiaCollazos
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Doc9.....................................
SofiaCollazos
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 

Pouring the Foundation: Data Management in the Energy Industry

  • 1. Pouring the Foundation: The Journey to Big Data Management at CenterPoint Energy CenterPoint Energy Proprietary Information
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Utility Industry Challenges & Pressures Aging Assets & Workforce Data Growth Regulatory Pressure Alternative / Distributed Energy Driving Innovation In the power and utilities space, the Big Data challenge is centered on harnessing massive new influxes of information to meet business imperatives such as reliability & efficiency, safety & security, profitability, and an evolving intelligent grid serving an increasingly sophisticated customer base. Source: PennWell “Big Data: Business Insight for Power and Utilities”, February 2016 In addition to these challenges, customers are becoming more demanding!!
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Smart Grid Explosion Big Data in the utilities sector can only get even bigger as the smart transformation of the industry accelerates. It is estimated that 680 million smart meters will be installed globally by 2017 – leading to 280 petabytes of data a year. Capgemini Consulting: “Big Data BlackOut: Are Utilities Powering Up Their Data Analytics?”, 2015 https://www.marsdd.com/wp-content/uploads/2014/08/MaRS-ConnectedWorld-AMI-Figure2-GlobalSmartMeterInstallations.jpg
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Utility Analytics Spending on the Rise Market analyst GTM Research predicts global utility company expenditure on data analytics will grow from $700m in 2012 to $3.8bn in 2020, with gas, electricity, and water suppliers in all regions of the world increasing their investment. Source: Engineering and Technology Magazine “How utilities are profiting from Big Data analytics”, January 20, 2014
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Actionable Intelligence Transforms Energy & Utilities Asset Data Customer Surveys Weather & Environmental Service Fleet GPS Data Smart Meter Streams Commodity Prices REVENUE PROTECTION SINGLE VIEW OF CUSTOMER PREDICTIVE EQUIPMENT MAINTENANCE CONSERVATION VOLTAGE REDUCTION NEXT BEST ACTION PROGRAMS Social Media GIS Data SCADA Outage Histories CIS Records EDW
  • 6. Agenda 6CenterPoint Energy Proprietary Information About CenterPoint Business Challenge Design Smart Meter Use Cases CNP Architecture Other Hadoop Initiatives
  • 7. About 7CenterPoint Energy Proprietary Information  Publicly traded on New York Stock Exchange  Headquartered in Houston, Texas  Over 5000 square miles of electric transmission and distribution service area  Assets total $22 billion  Over 7,700 plus employees  CNP & its predecessor companies in business for over 140 years  Over 5.5 Million Metered Customers  2.4 million Smart Meters  3,718 Miles of Transmission  52,639 Miles of Distribution  Electric Transmission & Distribution  Natural Gas Distribution  Competitive Natural Gas Sales and Services
  • 8. Business Challenge 1+ PB of Smart Meter Data  2.4MM Smart Meters taking readings every 15 creating 230MM Readings per day, or over 84 Billion Readings in a Year.  Regulatory requirements require historical readings to be available for 10 years.  Uncompressed Data Growth of 8TB per month and over 1PB in a 10 year period.  Current DW technology is approaching End of Life  Massive amounts of data stored in proprietary vendor solution, was hard to manage and has a significantly high total cost of ownership.  Need a cost effective solution for today's analytics, regulatory requirements and preparation for future use cases. 8CenterPoint Energy Proprietary Information
  • 9. Vision for ADMP 9CenterPoint Energy Proprietary Information Cost effective, scalable data management platform Data resides in the data tier which aligns with the response time required Real time reporting Reliable Support future advance use cases, streaming, machine learning, cognitive computing, etc.
  • 10. Architecture 10CenterPoint Energy Proprietary Information ApplicationsDataLake Data Sources ETLand Streaming Traditional (OLTP, OLAP, RDBMS) Unstructured
  • 11. Data Flow Interval data is loaded to SAP HANA 3 times a day using SAP Data Services • Intervals can be updated at any point but the majority of the updates happen within 13 months After 13 months, interval data is aged from SAP HANA to Hive using Sqoop • Interval data can still be updated occasionally after 13 months i.e. meter firmware update Master data is loaded into Hadoop using Sqoop CenterPoint Energy Proprietary Information
  • 12. Hive Design Transactional Hive table required for updates Shell script used to move data from staging to transactional target. Sqoop does not support inserts into a transactional table Partitioned by day with 8 buckets on premise identification number File size aligned with HDFS block size Master data bucketed the same as interval data to take advantage of performance gains during joins Data is sorted during the insert to the transactional table • If new data is inserted to a partition after the initial load, the partition is reloaded CenterPoint Energy Proprietary Information
  • 13. Smart Meter Use Cases 13CenterPoint Energy Proprietary Information Forecasting Model Engine  How does weather and consumer behavior impact load?  Weather response functions  Short-term and long-term forecasts  Weather normalization
  • 14. Smart Meter Use Cases Continued 14CenterPoint Energy Proprietary Information Diversion  Utilize interval and event data to detect and analyze any tamper or diversion attempt
  • 15. Smart Meter Use Cases Continued 15CenterPoint Energy Proprietary Information Usage History Portal  Web front-end for internal and external customers to view interval data for a premise Transformer Load Managment  Identify at risk transformers  Maximize usable life Load Studies  Hourly loads by rate class used in rate cases to allocate cost to rate classes  Previously random samples were used
  • 16. Other Hadoop Initiatives 16CenterPoint Energy Proprietary Information Document Storage  Historical invoices  5 million gas & electric PDF invoices a month  10 years of history required  Sub second response time required by web front-end  Less than 100 KB  Historical mainframe reports  Mainframe is being decommissioned but business clients still need access to historical reports  Response time less than 10 seconds is acceptable  Reports are converted to text files and stored as blobs in Hive